i was really suprised by this, too.

i did a bit of work for a company that does searchable literature a couple
of months ago.  they were having trouble with "bad unicode".  the problem
 was stuff like this:

CA: Corporate Author
    Nizhegorodskai͡a͡ gosudarstvennai͡a͡
    selʹskokhozi͡a͡ĭstvennai͡a͡ akademii͡a͡


the character that probablly doesn't look right is a combining double breve.
it's actually good data.  i tracked down the cover of this book and it's really
spelled like that.

the problem is that the unicode folk didn't have the foresight to include
stuff like this.

- erik

On Fri May 19 17:05:24 CDT 2006, [EMAIL PROTECTED] wrote:
> isn´t there enough space to keep all them there?
> 
> On 5/19/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > á is a single codepoint.  sure.  but there are useful letters that don't
> > exist in unicode unless they are composed.  e.g. romanized russian,
> > accented cyrillic, etc.
> >
> > - erik
> >
> > On Fri May 19 17:00:38 CDT 2006, [EMAIL PROTECTED] wrote:
> > > I think that á is just a single rune, not two different ones composed. If
> > > to type them, you have to type several keys, it´s just a keyboard issue,
> > > isn´t it? I don´t understand why this could go to a upper layer. Is there
> > > any other problem? (besides having to use utf8 for i/o, I mean).

Reply via email to