> header indicates the character set. The characters have hex codes > 0x92 (apostrophe), 0x93 (left quote), 0x94 (right quote), and 0x97 > (em-dash).
They are not ISO 8859/1; they are invalid codes in that character set. They are probably windows-1252 characters. Unfortunately Microsoft software delights in using their proprietary codes for smart quotes like this. Historically, they would even generate ”, even though that is invalid in all versions of HTML (entities always encode the standard character set for HTML which is ISO 10646, with some exclusions, for HTML 4 upwards and ISO 8859/1, before that. > "7-bit approximation;" setting the option "assumed document character > set" to ISO-8859 and to UTF-8; setting the option "raw 8-bit You need to set assumed document character set to windows-1252, if the actual character codes are being used. This probably won't work if the site actively lies about its character set. This should work if the actual characters are used. If entities are used, I don't know what heuristics Lynx users for undefined numeric entity values. _______________________________________________ Lynx-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/lynx-dev
