On Apr 23, 2007, at 15:54 UTC, Norman Palardy wrote:

> On 23-Apr-07, at 8:40 AM, Arnaud Nicolet wrote:
> 
> > Le 23 avr. 07 à 16:35 Soir, [EMAIL PROTECTED] a écrit:
> >
> >> Note that if this binary data is actually text, it's up to you to
> >> keep track of what encoding it is.
> >
> > Can you tell me again how we can figure out the encoding of a text
> > for which we don't know the origin? I think you already explained,
> > but I'm not sure (I'm not even sure it's possible in RB).

You can't figure out such an encoding.  That's why I said you need to
*keep track* of what encoding it is; I didn't say you need to figure it
out, since that's impossible.

Norman wrote:

> This is a good question since if you read a file you may or may not  
> be able to guess the encoding using some means (BOM, etc) There is a
> GuessJapaneseEncoding which will only guess what Japanese   encoding
> encoding text may be.

Right.  And even that isn't completely reliable; it's just using a
series of heuristics.

> Now how can you GUESS what encoding an arbitrary text file is so you 
> can read it properly ?

You can apply more heuristics; the latest version of StringUtils has a
GuessEncoding method that does this for the major forms of Unicode, but
doesn't go so far as to guess which legacy encoding it might be. 
That's a hard problem.

When you have control of the data -- for example, because you're saving
and restoring something in your own app -- then you should define away
the problem, by either storing the encoding info in the file or just
declaring that you're always going to use UTF-8 (or whatever).  But
when you're reading data from some other app, and have nothing to go
on, then you have to just take your best guess, and give the user a way
to correct it when your guess is wrong.

Best,
- Joe

--
Joe Strout -- [EMAIL PROTECTED]


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to