Package: wget
Version: 1.10.2+1.11.beta1-1
Followup-For: Bug #411290
Turns out wget url-encodes bytes 128-159 (which are control characters
in some 8-bit encodings). This is wrong as:
1. They are not controls in other 8-bit encodings.
2. In utf-8 this makes no sense and generates invalid utf-8 sequences.
--- src/url.c.old 2007-11-18 09:09:51.000000000 +0400
+++ src/url.c 2007-11-18 09:26:59.000000000 +0400
@@ -1261,8 +1261,8 @@
0, 0, 0, 0, 0, 0, 0, 0, /* p q r s t u v w */
0, 0, 0, 0, W, 0, 0, C, /* x y z { | } ~ DEL */
- C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, /* 128-143 */
- C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, /* 144-159 */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
P.S. As newer wget does not show the bug for non-existent urls, another
url to reproduce the bug is:
http://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]