On Sunday 26 October 2003 01:27 am, Marco Baroni wrote: > Thanks for your quick reply! > > > When you look at the file and you see > > a c with cedilla, can you tell whether is this actually the > > appropriate character, based on its context? Is this true > > of all such characters? > > I do not see a c with cedilla, I see a rhombus with a question > mark inside (which is the way my shell displays non-ASCII > characters). I guess it is a c with cedilla from the context.
Which one? In ISO 8859-1, that would be C7 uppercase C-cedilla E7 lowercase c cedilla but in MacRoman it would be something else, and there are other possibilities. Lowercase is far more common (in French, for example), but I make no assumptions about the language of the text. > So, I would like to ask you or anybody else: is there some > kind of tool (e.g., a text editor) that I could use to > discover which encoding is being used? (I tried with emacs but > failed). I don't have specific links, but this has been a topic of discussion on the Unicode mailing list. There is software that uses various heuristics to identify the character set and encoding of text files and streams. It doesn't distinguish the various 8-bit character sets, so I don't think it would help you. In simple cases like this, however, a hex editor is probably sufficient. There are many that show the value of each byte in a file along with one or more possible interpretations (binary, octal, and as a character or number of oe or another length in either little-endian or big-endian order). On Linux I use Khexedit. There are numerous such editors for Mac and Windows as well, including those in the Norton Utilities. The most likely case is that your file is in ISO 8859-1 or one of Microsoft's Windows code page extensions, both using the codes given above. > Thanks again. You're welcome. > Marco > > > > --- > Marco Baroni > University of Bologna > http://sslmit.unibo.it/~baroni -- Edward Cherlin, Simputer Evangelist Encore Technologies (S) Pte. Ltd. Computers for all of us http://www.simputerland.com, http://cherlin.blogspot.com