Re: Latest string_token Code

Ben Hanson Tue, 22 Jun 2010 09:00:20 -0700

== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article
> On 06/22/2010 08:13 AM, Ben Hanson wrote:
> > Here's the latest with naming convention (hopefully) followed. I've 
> > implemented my
> > own squeeze() function and used sizeof in the memmove calls.
> I suggest you to look into using the range primitives (empty, front,
> back, popFront, and popBack) with strings of any width. Your code
> assumes that all characters have the same width and therefore will
> behave erratically on UTF-8 and UTF-16 encodings.
> In the particular case of squeeze(), you may want to use uniq instead,
> which works on any forward range and will therefore decode characters
> properly:
> http://www.digitalmars.com/d/2.0/phobos/std_algorithm.html#uniq
> Andrei


OK, thanks.

Don't forget these are regular expressions though. I was wondering whether 
people
really want to pass regular expressions UTF encoded, but I suppose it could
happen. It's certainly a good idea to get used to using UTF compatible functions
anyway.

Is there is any support for Unicode continuation characters yet? Do you agree 
that
(ideally) Unicode text should be normalised before searching?

Regards,

Ben

Re: Latest string_token Code

Reply via email to