Hi Dale,

On 06/03/17 16:47, Dale R. Worley wrote:
> Orange Tsai <[email protected]> writes:
>> # This will work
>> $ wget 'http://127.0.0.1%0d%0aCookie%3a hi%0a/'
> 
> Not even considering the effect on headers, it's surprising that wget
> doesn't produce an immediate error, since
> "127.0.0.1%0d%0aCookie%3a hi%0a" is syntactically invalid as a host
> part.  Why doesn't wget's URL parser detect that?

Simply because it first splits the URL into several parts according to
the delimiters, and then decodes the percent-encoding.

Additionally for the host part it also checks whether it's an IP address
and the IDNA stuff, but yeah you raise a good point. Other than that the
host part is treated similarly to the other parts.

So all in a rush I see RFC 1034 says a domain name should have "any one
of the 52 alphabetic characters A through Z in upper case and a through
z in lower case", and digits, basically.

Do you think it's enough to just blacklist anything outside
[a-z0-9\.\-_], or is there something else to be done?

> I'm sure the new
> patch is an improvement, but it's surprising that the old code didn't
> detect that was an invalid URL anyway, since it contains characters that
> aren't permitted in those locations.
> 
> Dale
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to