On Tuesday, March 7, 2017 10:19:15 PM CET Ander Juaristi wrote: > Hi Dale, > > On 06/03/17 16:47, Dale R. Worley wrote: > > Orange Tsai <[email protected]> writes: > >> # This will work > >> $ wget 'http://127.0.0.1%0d%0aCookie%3a hi%0a/' > > > > Not even considering the effect on headers, it's surprising that wget > > doesn't produce an immediate error, since > > "127.0.0.1%0d%0aCookie%3a hi%0a" is syntactically invalid as a host > > part. Why doesn't wget's URL parser detect that? > > Simply because it first splits the URL into several parts according to > the delimiters, and then decodes the percent-encoding. > > Additionally for the host part it also checks whether it's an IP address > and the IDNA stuff, but yeah you raise a good point. Other than that the > host part is treated similarly to the other parts. > > So all in a rush I see RFC 1034 says a domain name should have "any one > of the 52 alphabetic characters A through Z in upper case and a through > z in lower case", and digits, basically. > > Do you think it's enough to just blacklist anything outside > [a-z0-9\.\-_], or is there something else to be done?
We are talking about URL validity, not DNS (That is what we use IDNA 2008 / TR46 for). Wget should comply to RFC3986 + RFC7320, basically. There are some subtleties of course, a good starting point is https://daniel.haxx.se/blog/ 2017/01/30/one-url-standard-please/. Thanks, Daniel. Now we see that 'relaxed' parsing has it's caveats and may easily result in security issues. Tim
signature.asc
Description: This is a digitally signed message part.
