Michel Fortin Wrote:
> Character literals are treated as simple numbers by the language. By > that I mean that you can write 'b' - 'a' == 1 and it'll be true. > Arithmetic makes absolutely no sense for graphemes. If you want a > special literal for graphemes, I'm afraid you'll have to invent > something new. And at this point, why not use a string? > > > > Making a new character or grapheme type which represented a grapheme > > would be _far_ simpler to understand IMO. However, making it work > > really well would likely require that the compiler know about the > > grapheme type like it knows about dchar. > > I'm looking for a simple solution. One that doesn't involve inventing a > new grapheme literal syntax or adding new types the compiler most know > about. I'm not really opposed to any of this, but the more complicated > is the solution, the less likely it is to be adopted. > > All I'm asking is that Unicode strings behave as Unicode strings should > behave. Making iteration use graphemes by default and string comparison > use the normalized form by default seems like a simple way to achieve > that goal. > > The most important is not the implementation, but that the default > behaviour be the right behaviour. > > > -- > Michel Fortin > michel.for...@michelf.com > http://michelf.com/ > I Understand your concern regarding a simpler implementation. You want to minimize the disruption caused by the proposed change. I'd argue that creating a specialized string type as Steve suggests makes integration *easier*. Your suggestion requires that foreach will be changed to default to grapheme. I agree that this can be done because it will not break silently but with Steve's string type this is unnecessary since the type itself would provide a grapheme range interface and the compiler doesn't need to know about this type at all. string becomes a regular library type. Of course, the type should support: string foo = "bar"; by making an implicit conversion from current arrays (to minimize compiler changes) The only disruption as far as I can tell would be using 'a' type literals instead of "a" but that will come up in compilation after string defaults to the new type. Also, all occurrences of: string foo = ...; foreach (c; foo) {...} // c is now a grapheme will now do the correct thing by default.