Shalom, fine folks.

-- Short story: --

When ripping Hebrew CDs the data I get from CDDB (or freeCDDB, I can't tell), 
data encoded with Aleph as 0xC3A0, Bet as 0xC3A1 and so on.


-- Longer story: --

I was able to convert it into proper utf8 [Aleph as (d7,90)] only via the 
pipeline:
... | iconv -f utf8 -t unicode | sed 's/\x0//g' | iconv -c -f iso88598 -t utf8

That is:

C3 A0 ==> `iconv -f utf8 -t unicode` ==> 00 E0

E0 hex = 224 dec # iso88598 , but for each byte I get an extra 00.

So the next part: `sed 's/\x0//g'` discard the 00 bytes.

Then the: `iconv -f iso8895 -tutf8` is a trivial step but without the `-c` it 
complains about illegal characters.


-- Some background: --

LANG=en_US.utf8 # but I had no success with any other LANG value.
LC_* is undefined
LANGUAGE=en_US:en

KDE 4.2,4
Kubuntu 9.04
English interface (the Hebrew interface in KDE4 is currently broken)


-- Questions: --

1. Is there an encoding where Aleph is 0xC3A0, if so what is it? If not how 
did I end up with this it?

2. Is there a less ugly way to get to from Aleph=0xC3A0 to proper UTF8?

3. Is this a bug, or a stupidity from my end?


Thanks you for your attention.
__
Cheers,
Chen.

_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to