On Fri, Aug 21, 2015 at 01:31:45PM +0200, Tim Ruehsen wrote: > > There is a remote site. > > Nothing is known about this remote site. > > Wrong. Regarding HTTP(S), we exactly know the encoding > of each downloaded HTML and CSS document > (that's what I call 'remote encoding').
You are an optimist. In my experience Firefox rarely gets it right. Let me find some random site. Say http://web2go.board19.com/gopro/go_view.php?id=12345 If I go there with Firefox, I get a go board with a lot of mojibake around it. Firefox took the encoding to be Unicode. Trying out what I have to say in the "Text encoding" menu, it turns out to be "Chinese, Traditional". > Leaving these misconfigured servers away as a special case But most of the East Asian servers I meet are misconfigured in this way. They announce text/html with charset utf-8 and come with some random charset. So trusting this announced charset should be done cautiously. And you say "misconfigured servers", but often one gets a Unix or Windows file hierarchy, and several character sets occur. The server doesnt know. The sysadmin doesnt know. A university machine will have many users with files in several languages and character sets. Moreover, the character set of a filename is in general unrelated to the character set of the contents of the file. That is most clear when the file is not a text file. What character set is the filename http://www.win.tue.nl/~aeb/linux/lk/kn%e4ckebr%f6d.jpg in? You recognize ISO 8859-1 or similar. My local machine is on UTF-8. The HTTP headers say "Content-Type: image/jpeg". How can wget guess? Andries
