Update of bug #50383 (project wget):
Status: None => Confirmed
_______________________________________________________
Follow-up Comment #1:
Two problems here:
1. the command-line URL is converted by 'remote_to_utf8()' ind url_parse().
This is wrong, locale_to_utf8() must be taken.
On many locales, this wouldn't make a difference with tilde, but I just
recognized it when tracing wget.
2. After dequeing (before download), wget converts the complete URL with
remote_to_utf8(). This is wrong - only the part coming from remote should be
converted (~foo came from local input).
Suggested fix:
The charset conversion to utf-8 should take place whenever input is taken
(from command line or from remote). Internally, wget should work with utf-8
only. That is what Wget2 already does.
I add my Python test script to reproduce this issue, if someone wants to work
on it. Copy it to testenv/ and manually start it or add it to Makefile.am.
(file #39814)
_______________________________________________________
Additional Item Attachment:
File name: Test-link-shiftjis.py Size:1 KB
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?50383>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/