On Saturday, March 03, 2012 21:05:40 Timon Gehr wrote: > On 03/03/2012 08:46 PM, Jonathan M Davis wrote: > > On Saturday, March 03, 2012 18:38:44 Timon Gehr wrote: > >> On 03/03/2012 09:40 AM, Jonathan M Davis wrote: > >>> ... but operating on > >>> code points is _far_ more correct than operating on code units. It's > >>> also > >>> more efficient. > >>> [snip.] > >> > >> No, it is less efficient. > > > > Operating on code points is more efficient than operating on graphemes is > > what I meant. I can see that I wasn't clear enough on that. > > Makes sense. > > > It's more correct than operating on code units and less correct than > > operating on graphemes,while it's less efficient than operating on code > > units and more efficient than operating on graphemes. > > > > - Jonathan M Davis > > When the code actually only cares about some characters that have 7-bit > ASCII values, most of the time there are no correctness issues when > operating on code units directly.
True, but writing code without caring about unicode frequently leads to bugs when you actually _do_ have to deal with unicode (the fact that an American programmer runs into unicode less just makes it worse, because they're less likely to catch their bugs), and char is UTF-8 by definition. So, operating specifically on ASCII is an optimization and should be coded for specifically rather than being generally encouraged. And having ranges over strings be code units rather than code points would encourage incorrect usage. The current solution encourages correct usage (or at least usage which is closer to correct, since it still isn't at the grapheme level) without disallowing more optimized code. - Jonathan M Davis