Re: Today's programming challenge - How's your Range-Fu ?

Abdulhaq via Digitalmars-d Sun, 19 Apr 2015 00:56:00 -0700

MiOn Sunday, 19 April 2015 at 02:20:01 UTC, Shachar Shemesh wrote:

On 18/04/15 21:40, Walter Bright wrote:
I'm not arguing against the existence of the Unicode standard,I'msaying I can't figure any justification for standardizingdifferent
encodings of the same thing.
A lot of areas in Unicode are due to pre-Unicode legacy.
I'm guessing here, but looking at the code points, é (U00e9 -Latin small letter E with acute), which comes from Latin-1,which is designed to follow ISO-8859-1. U0301 (Combining acuteaccent) comes from "Combining diacritical marks".
The way I understand things, Unicode would really prefer to useU0065+U0301 rather than U00e9. Because of legacy systems, andbecause they would rather have the ISO-8509 code pages be 1:1mappings, rather than 1:n mappings, they introduced code pointsthey really would rather do without.
This also explains the "presentation forms" code pages (e.g.http://www.unicode.org/charts/PDF/UFB00.pdf). These wereintended to be glyphs, rather than code points. Due to legacyreasons, it was not possible to simply discard them. Theyreceived code points, with a warning not to use these codepoints directly.
Also, notice that some letters can only be achieved usingmultiple code points. Hebrew diacritics, for example, do not,typically, have a composite form. My name fully spelled (whichyou rarely would do), שַׁחַר, cannot be represented with lessthan 6 code points, despite having only three letters.
The last paragraph isn't strictly true. You can use UFB2C +U05B7 for the first letter instead of U05E9 + U05C2 + U05B7.You would be using the presentation form which, as pointedabove, is only there for legacy.
Shachar
or shall I say
שחר


Yes Arabic is similar too

Re: Today's programming challenge - How's your Range-Fu ?

Reply via email to