> Date: Tue, 18 Aug 2015 12:55:50 +0200 > From: "Andries E. Brouwer" <[email protected]> > Cc: [email protected], "Andries E. Brouwer" <[email protected]>, > Eli Zaretskii <[email protected]> > > The point is: it is the user's choice to load a font. (Or to set a locale.)
Most users never change a locale, unless they are trying something special, precisely because their file names will display as mujibake. So wget should IMO by default cater to this use case, and allow saving the bytes verbatim as an option. > For historical reasons a single directory can have files with names > in several character sets. Again, this is a rare situation. We shouldn't punish the majority on behalf of such rare use cases. > All this is about the local situation. One cannot know "the character set" > of a filename because that concept does not exist in Unix. Of course, it exists. The _filesystem_ doesn't know it, but users do. > About the remote situation even less is known. Assuming UTF-8 will go a long way towards resolving this. When this is not so, we have the --remote-encoding switch. > It would be terrible if wget decided to use obscure heuristics to > invent a remote character set and then invoke iconv. But what you suggest instead -- create a file name whose bytes are an exact copy of the remote -- is just another heuristic. And the effects are no less terrible, because file names will become illegible, especially on systems where UTF-8 is not the locale's codeset. I'm okay with having an option to do that, but it shouldn't be the default, IMO.
