On Apr 23, 2007, at 15:54 UTC, Norman Palardy wrote: > On 23-Apr-07, at 8:40 AM, Arnaud Nicolet wrote: > > > Le 23 avr. 07 à 16:35 Soir, [EMAIL PROTECTED] a écrit: > > > >> Note that if this binary data is actually text, it's up to you to > >> keep track of what encoding it is. > > > > Can you tell me again how we can figure out the encoding of a text > > for which we don't know the origin? I think you already explained, > > but I'm not sure (I'm not even sure it's possible in RB).
You can't figure out such an encoding. That's why I said you need to *keep track* of what encoding it is; I didn't say you need to figure it out, since that's impossible. Norman wrote: > This is a good question since if you read a file you may or may not > be able to guess the encoding using some means (BOM, etc) There is a > GuessJapaneseEncoding which will only guess what Japanese encoding > encoding text may be. Right. And even that isn't completely reliable; it's just using a series of heuristics. > Now how can you GUESS what encoding an arbitrary text file is so you > can read it properly ? You can apply more heuristics; the latest version of StringUtils has a GuessEncoding method that does this for the major forms of Unicode, but doesn't go so far as to guess which legacy encoding it might be. That's a hard problem. When you have control of the data -- for example, because you're saving and restoring something in your own app -- then you should define away the problem, by either storing the encoding info in the file or just declaring that you're always going to use UTF-8 (or whatever). But when you're reading data from some other app, and have nothing to go on, then you have to just take your best guess, and give the user a way to correct it when your guess is wrong. Best, - Joe -- Joe Strout -- [EMAIL PROTECTED]
_______________________________________________ Unsubscribe or switch delivery mode: <http://www.realsoftware.com/support/listmanager/> Search the archives: <http://support.realsoftware.com/listarchives/lists.html>
