> Is there a reliable tool for determining the encoding of a file?  I know
> that distiguishing between 8 bit character sets, particularly between
> the various members of ISO 8859 cannot be done with certainty.  However,
> there shouldn't be any harm in incorrectly guessing one of those if it
> is saved the same way.  So I guess my question is whether there is
> already a tool out there that will tell me whether a file is UTF-8 or
> ISO 8859.
> 
C-Kermit 8.0:

  http://www.columbia.edu/kermit/

The command is "directory /xfermode".  Here's an example:

  C-Kermit>dir /xfermode
  -rw-------      5845  2000-10-03 14:27:20  cp437.txt (T)(8BIT)
  -rw-------      4105  2000-10-03 14:27:21  german.txt (T)(7BIT)
  -rw-rw----     62458  2000-09-08 12:43:42  gku100.tar.gz (B)
  -rw-------      3713  2000-10-03 14:27:21  hproman8.txt (T)(8BIT)
  -rw-rw-r--     50000  1995-08-09 10:29:04  hpux80c.Z (B)
  -rw-rw-r--     10358  1999-08-15 21:40:50  l1.ucs2 (T)(UCS2LE)
  -rw-------      5178  2000-10-03 14:27:22  latin1.txt (T)(8BIT)
  -rw-------      4911  2000-10-03 14:27:23  next.txt (T)(8BIT)
  -rw-rw-r--      2653  1999-08-03 10:14:12  test.utf8 (T)(UTF8)

This shows which mode the file would be transferred in:

  (T)      = Text
  (B)      = Binary

And if Text, which character-set class it belongs to:

  (7BIT)   = 7-Bit (ASCII or ISO 646)
  (8BIT)   = 8-bit (ISO 8859, CP437, CP1252, ...)
  (UTF8)   = UTF-8
  (UCS2BE) = Bare Unicode Big Endian
  (UCS2LE) = Bare Unicode Little Endian

and therefore, which translation mappings would be used.
For more info see:

  http://www.columbia.edu/kermit/ckermit2.html#x6
  http://www.columbia.edu/kermit/ckermit3.html#x4

- Frank

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to