# skips all double-encoded [ui]ris because it reinterprets them, outside 
uri.c:reencode_escapes(), probably in iri.c.
wget --iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html

# works
wget --no-iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html

Correct [ui]ri: 
http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%252FXLATB.html 
(200)
Incorrect [ui]ri: Correct [ui]ri: 
http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html (404)
# pcnt_decode(pcnt_decode(“%252F”) -> “%2F") -> “/"

Simple-but-incomplete hackaround: use --no-ri

To improve compatibility with mirroring international sites, the iri code path 
could approximate behavior of url.c/url_parse() by avoiding unnecessary 
modification to --mirror extracted [ui]ris, possibly around the time it 
adds/dequeues them to/from the queue.

Best,
Barry Allard

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to