> Is there a reliable tool for determining the encoding of a file? I know > that distiguishing between 8 bit character sets, particularly between > the various members of ISO 8859 cannot be done with certainty. However, > there shouldn't be any harm in incorrectly guessing one of those if it > is saved the same way. So I guess my question is whether there is > already a tool out there that will tell me whether a file is UTF-8 or > ISO 8859. > C-Kermit 8.0:
http://www.columbia.edu/kermit/ The command is "directory /xfermode". Here's an example: C-Kermit>dir /xfermode -rw------- 5845 2000-10-03 14:27:20 cp437.txt (T)(8BIT) -rw------- 4105 2000-10-03 14:27:21 german.txt (T)(7BIT) -rw-rw---- 62458 2000-09-08 12:43:42 gku100.tar.gz (B) -rw------- 3713 2000-10-03 14:27:21 hproman8.txt (T)(8BIT) -rw-rw-r-- 50000 1995-08-09 10:29:04 hpux80c.Z (B) -rw-rw-r-- 10358 1999-08-15 21:40:50 l1.ucs2 (T)(UCS2LE) -rw------- 5178 2000-10-03 14:27:22 latin1.txt (T)(8BIT) -rw------- 4911 2000-10-03 14:27:23 next.txt (T)(8BIT) -rw-rw-r-- 2653 1999-08-03 10:14:12 test.utf8 (T)(UTF8) This shows which mode the file would be transferred in: (T) = Text (B) = Binary And if Text, which character-set class it belongs to: (7BIT) = 7-Bit (ASCII or ISO 646) (8BIT) = 8-bit (ISO 8859, CP437, CP1252, ...) (UTF8) = UTF-8 (UCS2BE) = Bare Unicode Big Endian (UCS2LE) = Bare Unicode Little Endian and therefore, which translation mappings would be used. For more info see: http://www.columbia.edu/kermit/ckermit2.html#x6 http://www.columbia.edu/kermit/ckermit3.html#x4 - Frank - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/