On 4/18/2015 11:28 AM, H. S. Teoh via Digitalmars-d wrote:
On Sat, Apr 18, 2015 at 10:50:18AM -0700, Walter Bright via Digitalmars-d wrote:
On 4/18/2015 4:35 AM, Jacob Carlborg wrote:
\u0301 is the "combining acute accent" [1].

[1] http://www.fileformat.info/info/unicode/char/0301/index.htm

I won't deny what the spec says, but it doesn't make any sense to have
two different representations of eacute, and I don't know why anyone
would use the two code point version.

Well, *somebody* has to convert it to the single code point eacute,
whether it's the human (if the keyboard has a single key for it), or the
code interpreting keystrokes (the user may have typed it as e +
combining acute), or the program that generated the combination, or the
program that receives the data.

Data entry should be handled by the driver program, not a universal interchange format.


When we don't know provenance of
incoming data, we have to assume the worst and run normalization to be
sure that we got it right.

I'm not arguing against the existence of the Unicode standard, I'm saying I can't figure any justification for standardizing different encodings of the same thing.

Reply via email to