On 11.10.19 11:07, Eli Zaretskii wrote: >> From: Cameron Tacklind <[email protected]> >> Date: Thu, 10 Oct 2019 20:31:02 -0700 >> >> The error is pretty clearly an encoding conversion issue, going from UTF-8, >> assumed to be CP1252, converting into UTF-8, which becomes wrong. > > I think you need to tell Wget that the page encoding is UTF-8, by > using the --remote-encoding switch. Did you try that? >
Cameron's html file contains a 'meta' tag with attribute 'charset=utf-8'. So wget should detect it and convert the URL correctly. And I can confirm that wget is working properly here. My version is 1.20.3 and I am working on Linux. I put this file onto my local apache web server and named it quote.html: <!DOCTYPE html><html><head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <title>RIGHT SINGLE QUOTE TEST</title> </head><body> <a href="%E2%80%99">test</a> </body></html> My command line is wget -d -r http://localhost/quote.html Output is ... Decided to load it. URI encoding = »utf-8« Enqueuing http://localhost/%E2%80%99 at depth 1 Queue count 1, maxcount 1. [IRI Enqueuing »http://localhost/%E2%80%99« with »utf-8« Dequeuing http://localhost/%E2%80%99 at depth 1 Queue count 0, maxcount 1. Converted file name 'localhost/’' (UTF-8) -> 'localhost/’' (UTF-8) --2019-10-11 18:06:21-- http://localhost/%E2%80%99 ... ---request begin--- GET /%E2%80%99 HTTP/1.1 Referer: http://localhost/quote.html User-Agent: Wget/1.20.3 (linux-gnu) Accept: */* Accept-Encoding: identity Host: localhost Connection: Keep-Alive ---request end--- ... @Cameron: Your wget version seems ok, so I am a bit clueless right.now... Could you give me the output of 'wget --version' ? Could you test in the same way as I did above to see if that is reproducible for you or not ? Regards, Tim
signature.asc
Description: OpenPGP digital signature
