> Date: Sun, 13 Dec 2015 20:04:31 +0100
> From: "Andries E. Brouwer" <andries.brou...@cwi.nl>
> Cc: "Andries E. Brouwer" <andries.brou...@cwi.nl>, bug-wget@gnu.org
> 
> On Sun, Dec 13, 2015 at 08:01:27PM +0200, Eli Zaretskii wrote:
> 
> > If no one is going to pick up the gauntlet, I will sit down and do it
> > myself, although I'm terribly busy with Emacs 25.1 release.
> 
> Good!

OK, I'm ready to send the patch series.  I tested it on GNU/Linux and
on MS-Windows, and it passed all my tests.

I will send the patch in 2 parts.  This 1st part stops wget from
treating codepoints between 128 and 159 as control characters.  This
only makes sense with ISO-8859 encodings, which are used by a tiny
minority of systems nowadays.  Both UTF-8 and the Windows codepages
have printable characters and/or meaningful codes in that range that
must not be munged.

If we want to preserve back-compatibility in this respect, then a
variant of Tim's or Andries's patch could be used here, but the test
in it should be inverted: only if the locale's codeset is
ISO-8859-SOMETHING, we should tread these codepoints as control
characters.  All the other codesets should pass these codes unaltered.


diff --git a/src/url.c b/src/url.c
index c62867f..d984bf7 100644
--- a/src/url.c
+++ b/src/url.c
@@ -1399,8 +1404,8 @@ UVWC, VC, VC, VC,  VC, VC, VC, VC,   /* NUL SOH STX ETX  
EOT ENQ ACK BEL */
    0,  0,  0,  0,   0,  0,  0,  0,   /* p   q   r   s    t   u   v   w   */
    0,  0,  0,  0,   W,  0,  0,  C,   /* x   y   z   {    |   }   ~   DEL */
 
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 128-143 */
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 144-159 */
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0, /* 128-143 */
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0, /* 144-159 */
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
 

Reply via email to