Hi Dale, On 06/03/17 16:47, Dale R. Worley wrote: > Orange Tsai <[email protected]> writes: >> # This will work >> $ wget 'http://127.0.0.1%0d%0aCookie%3a hi%0a/' > > Not even considering the effect on headers, it's surprising that wget > doesn't produce an immediate error, since > "127.0.0.1%0d%0aCookie%3a hi%0a" is syntactically invalid as a host > part. Why doesn't wget's URL parser detect that?
Simply because it first splits the URL into several parts according to the delimiters, and then decodes the percent-encoding. Additionally for the host part it also checks whether it's an IP address and the IDNA stuff, but yeah you raise a good point. Other than that the host part is treated similarly to the other parts. So all in a rush I see RFC 1034 says a domain name should have "any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case", and digits, basically. Do you think it's enough to just blacklist anything outside [a-z0-9\.\-_], or is there something else to be done? > I'm sure the new > patch is an improvement, but it's surprising that the old code didn't > detect that was an invalid URL anyway, since it contains characters that > aren't permitted in those locations. > > Dale >
signature.asc
Description: OpenPGP digital signature
