On Fri, 14 Jan 2011 15:54:19 -0500, Gerrit Wichert <gwich...@yahoo.com> wrote:

Am 14.01.2011 15:34, schrieb Steven Schveighoffer:

Is it common to have multiple modifiers on a single character?  The
problem I see with using decomposed canonical form for strings is that
we would have to return a dchar[] for each 'element', which severely
complicates code that, for instance, only expects to handle English.

I was hoping to lazily transform a string into its composed canonical
form, allowing the (hopefully rare) exception when a composed
character does not exist.  My thinking was that this at least gives a
useful string representation for 90% of usages, leaving the remaining
10% of usages to find a more complex representation (like your Text
type).  If we only get like 20% or 30% there by making dchar the
element type, then we haven't made it useful enough.

I'm afraid that this is not a proper way to handle this problem. It may
be better for a language not to 'translate' by default.
If the user wants to convert the codepoints this can be requested on
demand. But pemature default conversion is a subltle way to lose
information that may be important.
Imagine we want to write a tool for dealing with the in/output of some
other ignorant legacy software. Even if it is only text files, that
software may choke on some converted input. So i belive that it is very
importent that we are able to reproduce strings in exact that form in
which we have read them in.

Actually, this would only lazily *and temporarily* convert the string per grapheme. Essentially, the original is left alone, so no harm there.

-Steve.

Reply via email to