On Fri, Apr 17, 2015 at 02:09:07AM -0700, Walter Bright via Digitalmars-d wrote: > Challenge level - Moderately easy > > Consider the function std.string.wrap: > > http://dlang.org/phobos/std_string.html#.wrap > > It takes a string as input, and returns a GC allocated string that is > word-wrapped. It needs to be enhanced to: > > 1. Accept a ForwardRange as input. > 2. Return a lazy ForwardRange that delivers the characters of the > wrapped result one by one on demand. > 3. Not allocate any memory. > 4. The element encoding type of the returned range must be the same as > the element encoding type of the input.
This is harder than it looks at first sight, actually. Mostly thanks to the complexity of Unicode... you need to identify zero-width, normal-width, and double-width characters, combining diacritics, various kinds of spaces (e.g. cannot break on non-breaking space) and treat them accordingly. Which requires decoding. (Well, in theory std.uni could be enhanced to work directly with encoded data, but right now it doesn't. In any case this is outside the scope of this challenge, I think.) Unfortunately, the only reliable way I know of currently that can deal with the spacing of Unicode characters correctly is to segment the input with byGrapheme, which currently is GC-dependent. So this fails (3). There's also the question of what to do with bidi markings: how do you handle counting the columns in that case? Of course, if you forego Unicode correctness, then you *could* just word-wrap on a per-character basis (i.e., every character counts as 1 column), but this also makes the resulting code useless as far as dealing with general Unicode data is concerned -- it'd only work for ASCII, and various character ranges inherited from the old 8-bit European encodings. Not to mention, line-breaking in Chinese encodings cannot work as prescribed anyway, because the rules are different (you can break anywhere at a character boundary except punctuation -- there is no such concept as a space character in Chinese writing). Same applies for Korean/Japanese. So either you have to throw out all pretenses of Unicode-correctness and just stick with ASCII-style per-character line-wrapping, or you have to live with byGrapheme with all the complexity that it entails. The former is quite easy to write -- I could throw it together in a couple o' hours max, but the latter is a pretty big project (cf. Unicode line-breaking algorithm, which is one of the TR's). T -- All problems are easy in retrospect.