I don't know that Unicode expertise is really required here anyway. All one has to know is that UTF8 is a multibyte encoding and built-in string attributes talk in bytes. Knowing when one wants bytes vs characters isn't rocket science. That said, I'm on the fence about this change. It breaks consistency for a benefit I'm still weighing. With this change, the char type will still be a single byte, correct? What happens to foreach on strings?
Sent from my iPhone On Dec 31, 2011, at 8:20 AM, Timon Gehr <timon.g...@gmx.ch> wrote: > On 12/31/2011 03:17 PM, Michel Fortin wrote: >> >> As for Walter being the only one coding by looking at the code units >> directly, that's not true. All my parser code look at code units >> directly and only decode to code points where necessary (just look at >> the XML parsing code I posted a while ago to get an idea to how it can >> apply to ranges). And I don't think it's because I've seen Walter code >> before, I think it is because I know how Unicode works and I want to >> make my parser efficient. I've done the same for a parser in C++ a while >> ago. I can hardly imagine I'm the only one (with Walter and you). I >> think this is how efficient algorithms dealing with Unicode should be >> written. >> > > +1.