On Tuesday, March 7, 2017 10:19:15 PM CET Ander Juaristi wrote:
> Hi Dale,
> 
> On 06/03/17 16:47, Dale R. Worley wrote:
> > Orange Tsai <[email protected]> writes:
> >> # This will work
> >> $ wget 'http://127.0.0.1%0d%0aCookie%3a hi%0a/'
> > 
> > Not even considering the effect on headers, it's surprising that wget
> > doesn't produce an immediate error, since
> > "127.0.0.1%0d%0aCookie%3a hi%0a" is syntactically invalid as a host
> > part.  Why doesn't wget's URL parser detect that?
> 
> Simply because it first splits the URL into several parts according to
> the delimiters, and then decodes the percent-encoding.
> 
> Additionally for the host part it also checks whether it's an IP address
> and the IDNA stuff, but yeah you raise a good point. Other than that the
> host part is treated similarly to the other parts.
> 
> So all in a rush I see RFC 1034 says a domain name should have "any one
> of the 52 alphabetic characters A through Z in upper case and a through
> z in lower case", and digits, basically.
> 
> Do you think it's enough to just blacklist anything outside
> [a-z0-9\.\-_], or is there something else to be done?

We are talking about URL validity, not DNS (That is what we use IDNA 2008 / 
TR46 for). Wget should comply to RFC3986 + RFC7320, basically. There are some 
subtleties of course, a good starting point is https://daniel.haxx.se/blog/
2017/01/30/one-url-standard-please/. Thanks, Daniel.

Now we see that 'relaxed' parsing has it's caveats and may easily result in 
security issues.

Tim

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to