On Saturday, 4 June 2016 at 08:12:47 UTC, Walter Bright wrote:
On 6/3/2016 11:17 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Jun 03, 2016 at 08:03:16PM -0700, Walter Bright via
Digitalmars-d wrote:
It works for books.
Because books don't allow their readers to change the font.
Unicode is not the font.
This madness already exists *without* Unicode. If you have a
page with a
single glyph 'm' printed on it and show it to an English
speaker, he
will say it's lowercase M. Show it to a Russian speaker, and
he will say
it's lowercase Т. So which letter is it, M or Т?
It's not a problem that Unicode can solve. As you said, the
meaning is in the context. Unicode has no context, and tries to
solve something it cannot.
('m' doesn't always mean m in english, either. It depends on
the context.)
Ya know, if Unicode actually solved these problems, you'd have
a case. But it doesn't, and so you don't :-)
If you're going to represent both languages, you cannot get
away from
needing to represent letters abstractly, rather than visually.
Books do visually just fine!
So should O and 0 share the same glyph or not? They're
visually the same
thing,
No, they're not. Not even on old typewriters where every key
was expensive. Even without the slash, the O tends to be fatter
than the 0.
The very fact that we distinguish between O and 0,
independently of what
Unicode did/does, is already proof enough that going by visual
representation is inadequate.
Except that you right now are using a font where they are
different enough that you have no trouble at all distinguishing
them without bothering to look it up. And so am I.
In other words toUpper and toLower does not belong in the
standard
library. Great.
Unicode and the standard library are two different things.
Even if a character in different languages share a glyph or look
identical though, it makes sense to duplicate them with different
code points/units/whatever.
Simple functions like isCyrillicLetter() can then do a simple
less-than / greater-than comparison instead of having a lookup
table to check different numeric representations scattered
throughout the Unicode table. Functions like toUpper and toLower
become easier to write as well (for SOME languages anyhow), it's
simply myletter +/- numlettersinalphabet. Redundancy here is very
helpful.
Maybe instead of Unicode they should have called it Babel... :)
"The Lord said, “If as one people speaking the same language they
have begun to do this, then nothing they plan to do will be
impossible for them. Come, let us go down and confuse their
language so they will not understand each other.”"
-Jon