Re: [Bug-wget] bad filenames (again)

Andries E. Brouwer Fri, 07 Aug 2015 08:48:24 -0700

On Fri, Aug 07, 2015 at 05:13:19PM +0200, Tim Ruehsen wrote:

> > Hi Tim,
> > 
> > I disagree. This is just a bug.
> > Nobody wants illegal filenames.
> > Even removing them is not entirely trivial since the filenames
> > produced by wget are not legal character sequences, so cannot be typed.
> 
> Hi Andries,
> 
> If it's a bug, let's just fix it (without breaking compatibility).
> 
> But as far as I understand escaping occurs within legal UTF-8 sequences
> - and you are right when saying this is a bug when we have a UTF-8 locale.
> 
> The solution would something like
> 
> if locale is UTF-8
>   do not escape valid UTF-8 sequences
> else
>   keep wget's current behavior
> 
> Would you agree ?


Yes, not escaping in an UTF-8 environment when filenames are valid UTF-8
would certainly be a big improvement.
Probably other multibyte character sets would have the same issues.

> If URLs (and thus filenames) are not in UTF-8, Wget will convert them
> to UTF-8 before the above procedure (I guess that is what wget does
> anyways, well not 100% sure).

Will check. There are the two conflicting desires:
(i) never change data, (ii) create files with a legal filename.

> If you provide patch for this we will appreciate that.

OK. Will find current wget source and send a patch.
(Not today, but soonish.)

Andries

Re: [Bug-wget] bad filenames (again)

Reply via email to