On Saturday, March 03, 2012 21:05:40 Timon Gehr wrote:
> On 03/03/2012 08:46 PM, Jonathan M Davis wrote:
> > On Saturday, March 03, 2012 18:38:44 Timon Gehr wrote:
> >> On 03/03/2012 09:40 AM, Jonathan M Davis wrote:
> >>> ...  but operating on
> >>> code points is _far_ more correct than operating on code units. It's
> >>> also
> >>> more efficient.
> >>> [snip.]
> >> 
> >> No, it is less efficient.
> > 
> > Operating on code points is more efficient than operating on graphemes is
> > what I meant. I can see that I wasn't clear enough on that.
> 
> Makes sense.
> 
> > It's more correct than operating on code units and less correct than
> > operating on graphemes,while it's less efficient than operating on code
> > units and more efficient than operating on graphemes.
> > 
> > - Jonathan M Davis
> 
> When the code actually only cares about some characters that have 7-bit
> ASCII values, most of the time there are no correctness issues when
> operating on code units directly.

True, but writing code without caring about unicode frequently leads to bugs 
when you actually _do_ have to deal with unicode (the fact that an American 
programmer runs into unicode less just makes it worse, because they're less 
likely to catch their bugs), and char is UTF-8 by definition.

So, operating specifically on ASCII is an optimization and should be coded for 
specifically rather than being generally encouraged. And having ranges over 
strings be code units rather than code points would encourage incorrect usage. 
The current solution encourages correct usage (or at least usage which is 
closer to correct, since it still isn't at the grapheme level) without 
disallowing more optimized code.

- Jonathan M Davis

Reply via email to