On Fri, 14 Jan 2011 08:59:35 -0500, spir <denis.s...@gmail.com> wrote:

On 01/14/2011 02:37 PM, Steven Schveighoffer wrote:

* I don't even know how to make a grapheme that is more than one
code-unit, let alone more than one code-point :)  Every time I try, I
get 'invalid utf sequence'.

I feel significantly ignorant on this issue, and I'm slowly getting
enough knowledge to join the discussion, but being a dumb American who
only speaks English, I have a hard time grasping how this shit all works.

1. See my text at https://bitbucket.org/denispir/denispir-d/src/c572ccaefa33/U%20missing%20level%20of%20abstraction

I can't read that document, it's black background with super-dark-grey text.

2.
     writeln ("A\u0308\u0330");
<A + tilde above + umlaut below> (or the opposite)
If it does not display properly, either set your terminal to UTF* or use a more unicode-aware font (eg DejaVu series).

OK, I'll have to remember this so I can use it to test my string type ;)

The point is not playing like that with Unicode flexibility. Rather that composite characters are just normal thingies in most languages of the world. Actually, on this point, english is a rare exception (discarding letters imported from foreign languages like french 'à'); to the point of beeing, I guess, the only western language without any diacritic.

Is it common to have multiple modifiers on a single character? The problem I see with using decomposed canonical form for strings is that we would have to return a dchar[] for each 'element', which severely complicates code that, for instance, only expects to handle English.

I was hoping to lazily transform a string into its composed canonical form, allowing the (hopefully rare) exception when a composed character does not exist. My thinking was that this at least gives a useful string representation for 90% of usages, leaving the remaining 10% of usages to find a more complex representation (like your Text type). If we only get like 20% or 30% there by making dchar the element type, then we haven't made it useful enough.

Either way, we need a string type that can be compared canonically for things like searches or opEquals.

-Steve

Reply via email to