Joerg Schilling wrote:
> Don Cragun <don.cragun at sun.com> wrote:
> > >BTW: Regarding our talk... I checked the POSIX standard and it turns out
> > >that od(1) support for UTF-8 "chars" is fully optional. There is no need to
> > >support it.
> >
> > >J?rg
> >
> > Joerg,
> >       This is only partly true.
> 
> Please also comment Rolands claim that UNICODE is not a lossless coding.
> Roland mentioned this recently without giving evidence.

There wasn't enougth time during our meeting to show the problem in
detail...

> I can hardly believe that the 21 bit coding used by UNICODE still has problems
> to map other codings. UNICODE has been designed to be a lossless coding....

... I try to keep it short: Some encodings (e.g. ISO-2022) can define
the language being used in the following characters (similar to the
xml:lang="<lang>" tag in XML). Since Unicode folds some charcaters which
are shared between languages to one codepoint (search for
"han-unification") this information is lost[1], making Unicode not 100%
lossless. Sounds trivial but it results in some unhappy&&nasty issues
when the users mix text from multiple languages (one of the "harmless"
things is that browsers will choose fonts based on the langauge being
used - which may lead to issues like a japanese font being used for a
single lonely character in the middle of an otherwise completely chinese
text... and backwards... (and if you've followed the history of both
countries in the last >= 1500 years you may realise that they don't like
that much...)), unfortunately for languages where the matching countries
are hyper-picky about their characters (note: That's an understatement).

[1]=Technicially there are language-selector characters in a block
outside the BMP (= Basic Multilinguar Plane) but I'm not sure whether
they are really thought for this use - at least the existing converters
do not use them and I can't find a standard (or even draft) which
defines their usage. Or short: The situation is stuck badly in the mud.

If you want the long story ask in i18n-discuss@, AFAIK Ienup can explain
all the details better than I can do...

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to