Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-18 Thread Ali Çehreli

On Wednesday, 18 April 2012 at 05:45:06 UTC, Jakob Ovrum wrote:
> On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote:
>>
>> The reason is, a sequence of UTF-8 code units are not a valid
>> UTF-8 when reversed (or retro'ed :p). But a dchar array can be
>> reversed.
>>
>> Ali
>
> It is absolutely possible to walk a UTF-8 string backwards.

Indeed. I didn't mean otherwise. I was trying to explain why "The type 
of the return expression is dstring, not string."


And I just checked, again, that my use of "UTF-8 code units" above was 
correct. :) I didn't say "Unicode code points".


Ali



Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread Jakob Ovrum

On Tuesday, 17 April 2012 at 15:18:49 UTC, bearophile wrote:

Jakob Ovrum:


return array(strippedTail);
}

The type of the return expression is dstring, not string.

What is the most elegant way or correct way to solve this 
friction?


(Note: the function is used in CTFE)


Try "text" instead of "array".

Bye,
bearophile


Thanks, that did it :)

(I also forgot to retro() a second time to make it build the 
array in the original direction, before anyone points it out)


Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread Jakob Ovrum

On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote:


The reason is, a sequence of UTF-8 code units are not a valid 
UTF-8 when reversed (or retro'ed :p). But a dchar array can be 
reversed.


Ali


It is absolutely possible to walk a UTF-8 string backwards.

The problem here is that arrays of char are ranges of dchar; 
hence you can't go the regular generic path and have to use 
text() instead.


Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread Ali Çehreli

On 04/17/2012 09:12 AM, Timon Gehr wrote:
> On 04/17/2012 06:09 PM, Ali Çehreli wrote:

>> The algorithm must be building a local string.

> It does not have to build a local string, see
> http://dlang.org/phobos/std_utf.html#strideBack

I never said otherwise. :p

I was too lazy to locate where 2.059's algorithm.d was placed under. 
Apparently it is here:


  /usr/include/x86_64-linux-gnu/dmd/phobos/std/algorithm.d

The algorithm is smart. It reverses individual Unicode characters 
in-place first and then reverses the whole string one last time:


void reverse(Char)(Char[] s)
if (isNarrowString!(Char[]) && !is(Char == const) && !is(Char == immutable))
{
auto r = representation(s);
for (size_t i = 0; i < s.length; )
{
immutable step = std.utf.stride(s, i);
if (step > 1)
{
.reverse(r[i .. i + step]);
i += step;
}
else
{
++i;
}
}
reverse(r);
}

Ali

P.S. Being a C++ programmer, exception-safety is always warm in my mind. 
Unfortunately the topic does not come up much in D forums. The algorithm 
above is not exception-safe because stride() may throw. But this way off 
topic on this thread. :)




Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread Timon Gehr

On 04/17/2012 06:09 PM, Ali Çehreli wrote:

On 04/17/2012 08:58 AM, bearophile wrote:
 > Ali Çehreli:
 >
 >> The reason is, a sequence of UTF-8 code units are not a valid UTF-8
 >> when reversed (or retro'ed :p).
 >
 > But reversed(char[]) now works :-)

That's pretty cool. :) (You meant reverse()).

Interesting, because there could be no other way anyway because
reverse() is in-place. Iterating by dchar without damaging the other end
must have been challenging because the first half of the string may have
been all multi-bype UTF-8 code units and all of the rest of single-bytes.

The algorithm must be building a local string.

 > Bye,
 > bearophile

Ali



It does not have to build a local string, see
http://dlang.org/phobos/std_utf.html#strideBack


Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread Ali Çehreli

On 04/17/2012 08:58 AM, bearophile wrote:
> Ali Çehreli:
>
>> The reason is, a sequence of UTF-8 code units are not a valid UTF-8
>> when reversed (or retro'ed :p).
>
> But reversed(char[]) now works :-)

That's pretty cool. :) (You meant reverse()).

Interesting, because there could be no other way anyway because 
reverse() is in-place. Iterating by dchar without damaging the other end 
must have been challenging because the first half of the string may have 
been all multi-bype UTF-8 code units and all of the rest of single-bytes.


The algorithm must be building a local string.

> Bye,
> bearophile

Ali



Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread bearophile

Ali Çehreli:

The reason is, a sequence of UTF-8 code units are not a valid 
UTF-8 when reversed (or retro'ed :p).


But reversed(char[]) now works :-)

Bye,
bearophile


Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread Ali Çehreli

On 04/17/2012 08:12 AM, Jakob Ovrum wrote:
> Consider this simple function:
>
> private string findParameterList(string typestr)
> {
> auto strippedHead = typestr.find("(")[1 .. $];
> auto strippedTail = retro(strippedHead).find(")");
>
> strippedTail.popFront(); // slice off closing parenthesis
>
> return array(strippedTail);
> }
>
> The type of the return expression is dstring, not string.

The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when 
reversed (or retro'ed :p). But a dchar array can be reversed.


Ali



Re: retro() on a `string` creates a range of `dchar`, causing array() pains

2012-04-17 Thread bearophile

Jakob Ovrum:


return array(strippedTail);
}

The type of the return expression is dstring, not string.

What is the most elegant way or correct way to solve this 
friction?


(Note: the function is used in CTFE)


Try "text" instead of "array".

Bye,
bearophile