KennyTM~ Wrote: > On Mar 26, 10 18:52, yigal chripun wrote: > > KennyTM~ Wrote: > > > >> On Mar 26, 10 05:46, yigal chripun wrote: > >>> > >>> while it's true that '?' has one unicode value for it, it's not true for > >>> all sorts of diacritics and combine code-points. So your approach is to > >>> pass the responsibility for that to the end user which in 99.9999% will > >>> not handle this correctlly. > >>> > >> > >> Non-issue. Since when can a character literal store> 1 code-point? > > > > character != code-point > > > > D chars are really as you say code-points and not always complete > > characters. > > > > here's a use case for you: > > you want to write a fully unicode aware search engine. > > If you just try to match the given sequnce of code-points in the search > > term, you will miss valid matches since, for instance you do not take into > > account permutations of the order of combining marks. > > you can't just assume that the code-point value identifies the character. > > Stop being off-topic. '?' is of type char, not string. A char always > holds an octet of UTF-8 encoded sequence. The numerical content is > unique and well-defined*. Therefore adding 4 to '?' also has a meaning. > > * If you're paranoid you may request the spec to ensure the character is > in NFC form.
Huh? You jump in in the middle of conversation and I'm off-topic? Now, to get back to the topic at hand: D's current design is: char/dchar/wchar are integral types that can contain any value/encoding even though D prefers Unicode. This is not enforced. e.g. you can have a valid wchar which you increment by 1 and get an invalid wchar. Instead, Let's have proper well defined semantics in D: Design A: char/wchar/dchar are defined to be Unicode code-points for the respective encodings. These is enforces by the language so if you want to define a different encoding you must use something like bits!8 arithmetic on code-points is defined according to the Unicode standard. Design B: char represents a (perhaps multi-byte) character. Arithmetic on this type is *not* defined. In either case these types should not be treated as plain integral types.