I don't know that Unicode expertise is really required here anyway.  All one 
has to know is that UTF8 is a multibyte encoding and built-in string attributes 
talk in bytes. Knowing when one wants bytes vs characters isn't rocket science. 
That said, I'm on the fence about this change. It breaks consistency for a 
benefit I'm still weighing. With this change, the char type will still be a 
single byte, correct?  What happens to foreach on strings?

Sent from my iPhone

On Dec 31, 2011, at 8:20 AM, Timon Gehr <timon.g...@gmx.ch> wrote:

> On 12/31/2011 03:17 PM, Michel Fortin wrote:
>> 
>> As for Walter being the only one coding by looking at the code units
>> directly, that's not true. All my parser code look at code units
>> directly and only decode to code points where necessary (just look at
>> the XML parsing code I posted a while ago to get an idea to how it can
>> apply to ranges). And I don't think it's because I've seen Walter code
>> before, I think it is because I know how Unicode works and I want to
>> make my parser efficient. I've done the same for a parser in C++ a while
>> ago. I can hardly imagine I'm the only one (with Walter and you). I
>> think this is how efficient algorithms dealing with Unicode should be
>> written.
>> 
> 
> +1.

Reply via email to