Re: retro() on a `string` creates a range of `dchar`, causing array() pains
On Wednesday, 18 April 2012 at 05:45:06 UTC, Jakob Ovrum wrote: > On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote: >> >> The reason is, a sequence of UTF-8 code units are not a valid >> UTF-8 when reversed (or retro'ed :p). But a dchar array can be >> reversed. >> >> Ali > > It is absolutely possible to walk a UTF-8 string backwards. Indeed. I didn't mean otherwise. I was trying to explain why "The type of the return expression is dstring, not string." And I just checked, again, that my use of "UTF-8 code units" above was correct. :) I didn't say "Unicode code points". Ali
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
On Tuesday, 17 April 2012 at 15:18:49 UTC, bearophile wrote: Jakob Ovrum: return array(strippedTail); } The type of the return expression is dstring, not string. What is the most elegant way or correct way to solve this friction? (Note: the function is used in CTFE) Try "text" instead of "array". Bye, bearophile Thanks, that did it :) (I also forgot to retro() a second time to make it build the array in the original direction, before anyone points it out)
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote: The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p). But a dchar array can be reversed. Ali It is absolutely possible to walk a UTF-8 string backwards. The problem here is that arrays of char are ranges of dchar; hence you can't go the regular generic path and have to use text() instead.
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
On 04/17/2012 09:12 AM, Timon Gehr wrote: > On 04/17/2012 06:09 PM, Ali Çehreli wrote: >> The algorithm must be building a local string. > It does not have to build a local string, see > http://dlang.org/phobos/std_utf.html#strideBack I never said otherwise. :p I was too lazy to locate where 2.059's algorithm.d was placed under. Apparently it is here: /usr/include/x86_64-linux-gnu/dmd/phobos/std/algorithm.d The algorithm is smart. It reverses individual Unicode characters in-place first and then reverses the whole string one last time: void reverse(Char)(Char[] s) if (isNarrowString!(Char[]) && !is(Char == const) && !is(Char == immutable)) { auto r = representation(s); for (size_t i = 0; i < s.length; ) { immutable step = std.utf.stride(s, i); if (step > 1) { .reverse(r[i .. i + step]); i += step; } else { ++i; } } reverse(r); } Ali P.S. Being a C++ programmer, exception-safety is always warm in my mind. Unfortunately the topic does not come up much in D forums. The algorithm above is not exception-safe because stride() may throw. But this way off topic on this thread. :)
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
On 04/17/2012 06:09 PM, Ali Çehreli wrote: On 04/17/2012 08:58 AM, bearophile wrote: > Ali Çehreli: > >> The reason is, a sequence of UTF-8 code units are not a valid UTF-8 >> when reversed (or retro'ed :p). > > But reversed(char[]) now works :-) That's pretty cool. :) (You meant reverse()). Interesting, because there could be no other way anyway because reverse() is in-place. Iterating by dchar without damaging the other end must have been challenging because the first half of the string may have been all multi-bype UTF-8 code units and all of the rest of single-bytes. The algorithm must be building a local string. > Bye, > bearophile Ali It does not have to build a local string, see http://dlang.org/phobos/std_utf.html#strideBack
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
On 04/17/2012 08:58 AM, bearophile wrote: > Ali Çehreli: > >> The reason is, a sequence of UTF-8 code units are not a valid UTF-8 >> when reversed (or retro'ed :p). > > But reversed(char[]) now works :-) That's pretty cool. :) (You meant reverse()). Interesting, because there could be no other way anyway because reverse() is in-place. Iterating by dchar without damaging the other end must have been challenging because the first half of the string may have been all multi-bype UTF-8 code units and all of the rest of single-bytes. The algorithm must be building a local string. > Bye, > bearophile Ali
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
Ali Çehreli: The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p). But reversed(char[]) now works :-) Bye, bearophile
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
On 04/17/2012 08:12 AM, Jakob Ovrum wrote: > Consider this simple function: > > private string findParameterList(string typestr) > { > auto strippedHead = typestr.find("(")[1 .. $]; > auto strippedTail = retro(strippedHead).find(")"); > > strippedTail.popFront(); // slice off closing parenthesis > > return array(strippedTail); > } > > The type of the return expression is dstring, not string. The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p). But a dchar array can be reversed. Ali
Re: retro() on a `string` creates a range of `dchar`, causing array() pains
Jakob Ovrum: return array(strippedTail); } The type of the return expression is dstring, not string. What is the most elegant way or correct way to solve this friction? (Note: the function is used in CTFE) Try "text" instead of "array". Bye, bearophile