> Date: Sun, 13 Dec 2015 20:04:31 +0100 > From: "Andries E. Brouwer" <andries.brou...@cwi.nl> > Cc: "Andries E. Brouwer" <andries.brou...@cwi.nl>, bug-wget@gnu.org > > On Sun, Dec 13, 2015 at 08:01:27PM +0200, Eli Zaretskii wrote: > > > If no one is going to pick up the gauntlet, I will sit down and do it > > myself, although I'm terribly busy with Emacs 25.1 release. > > Good!
OK, I'm ready to send the patch series. I tested it on GNU/Linux and on MS-Windows, and it passed all my tests. I will send the patch in 2 parts. This 1st part stops wget from treating codepoints between 128 and 159 as control characters. This only makes sense with ISO-8859 encodings, which are used by a tiny minority of systems nowadays. Both UTF-8 and the Windows codepages have printable characters and/or meaningful codes in that range that must not be munged. If we want to preserve back-compatibility in this respect, then a variant of Tim's or Andries's patch could be used here, but the test in it should be inverted: only if the locale's codeset is ISO-8859-SOMETHING, we should tread these codepoints as control characters. All the other codesets should pass these codes unaltered. diff --git a/src/url.c b/src/url.c index c62867f..d984bf7 100644 --- a/src/url.c +++ b/src/url.c @@ -1399,8 +1404,8 @@ UVWC, VC, VC, VC, VC, VC, VC, VC, /* NUL SOH STX ETX EOT ENQ ACK BEL */ 0, 0, 0, 0, 0, 0, 0, 0, /* p q r s t u v w */ 0, 0, 0, 0, W, 0, 0, C, /* x y z { | } ~ DEL */ - C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, /* 128-143 */ - C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, /* 144-159 */ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 128-143 */ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 144-159 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,