On Fri, Aug 07, 2015 at 05:13:19PM +0200, Tim Ruehsen wrote: > > Hi Tim, > > > > I disagree. This is just a bug. > > Nobody wants illegal filenames. > > Even removing them is not entirely trivial since the filenames > > produced by wget are not legal character sequences, so cannot be typed. > > Hi Andries, > > If it's a bug, let's just fix it (without breaking compatibility). > > But as far as I understand escaping occurs within legal UTF-8 sequences > - and you are right when saying this is a bug when we have a UTF-8 locale. > > The solution would something like > > if locale is UTF-8 > do not escape valid UTF-8 sequences > else > keep wget's current behavior > > Would you agree ?
Yes, not escaping in an UTF-8 environment when filenames are valid UTF-8 would certainly be a big improvement. Probably other multibyte character sets would have the same issues. > If URLs (and thus filenames) are not in UTF-8, Wget will convert them > to UTF-8 before the above procedure (I guess that is what wget does > anyways, well not 100% sure). Will check. There are the two conflicting desires: (i) never change data, (ii) create files with a legal filename. > If you provide patch for this we will appreciate that. OK. Will find current wget source and send a patch. (Not today, but soonish.) Andries
