David Woolley dixit: >> Here under Windows there are constant references to the character that >> begins a 16-bit-wide-character file (FF FE) or UTF-8 file (EF BB BF). > > These are all valid printable characters in ISO 8859/x. Although somewhat > unlikely combinations, they are not reserved sequences.
We are talking about a file that does _begin_ with these byte sequences here, not a file that solely consists of them. For UCS-* the things are quite clear, you get <\0h\0t\0m\0l\0> so it obviously is not any 8-bit encoding. For UTF-8, it’s not that easy, but: • If the file is UTF-8 and uses any nōn-ASCII characters, it almost always will contain an octet from the [0x80‥0x9F] range, which practically rules it out from being encoded as latin1 • In case of doubt: If the file contains only valid UTF-8 with no encoding errors (invalid multibyte sequences), lean towards it, as it’s the current standard replacing the 8-bit character sets • If the file only contains ASCII characters, while point #1 above is no longer valid, the difference is moot anyway bye, //mirabilos -- “It is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.” -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2 _______________________________________________ Lynx-dev mailing list Lynx-dev@nongnu.org http://lists.nongnu.org/mailman/listinfo/lynx-dev