Re: Questions regarding utf-8

Matthias Urlichs Fri, 16 May 2003 10:40:27 -0500

Hi, John Darrington wrote:

> Given a text file, it will attempt to guess the natural language in
> which it was written. I'm sure it would be fairly simple to modify it to
> guess the charset.  If you point me to a reasonably large set of example
> files, I'll see what I can do.


You could use your existing samples, which hopefully include a number of
non-ASCII characters, recode them to UTF-8, and then try a few encodings
-- the German text would typically be in latin-1, latin-15, or one of the
Windows or Mac specific charsets for West or Central Europe.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  [EMAIL PROTECTED]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
-- 
Dimensions will always be expressed in the least usable term.
EXAMPLE: Velocity will be expressed in furlongs per fortnight.

Re: Questions regarding utf-8

Reply via email to