[M-Dev] GuessUnicodeCharset

Nerijus Baliunas Sun, 26 Sep 2004 15:06:05 -0700

Hello,

GuessUnicodeCharset() is not working sometimes with Lithuanian
texts, when the first non ASCII character (�, � for example) is in
ISO-8859-1 or -2 (in addition to -4 or -13, i.e. Baltic encodings).
Then ISO-8859-1 or -2 is chosen for conversion from UTF-8 and
the text is garbled, as it usually has characters which are not in -1
or -2. I thought of 2 solutions:
* find all non ASCII characters instead of the first only (or at least
3-5), and analyze all of them.
* If found encoding is ISO-8859-1 or -2, continue searching for
non ASCII chars until some other encoding is matched, then stop.


Which is better? Do you have other ideas?

Regards,
Nerijus


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Mahogany-Developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mahogany-developers

[M-Dev] GuessUnicodeCharset

Reply via email to