Here's a hostile URL that "urlparse.urlparse" seems to have mis-parsed.
====
http://[EMAIL 
PROTECTED]&xUDysvTbzZZOaymjQ2oYIx2AvMdJ1WQfjP02wIBBQBb1EVZAqmmGunxrcyGx1AcfegWUUYtaZfRW434O5Qn6InSMUZXgF5e3KzJbCntBGOj7pv31zab&action=login-run&passkey=e84239c9da59dbeb61d4d45db2cc5840&info_hash=%c9q%be%fe%c6j%ca%fd0%18%fe%23J%bd%89%d3%06L%fdV&info_hash=%18%9d%fb%15v%c0A%1f%c8%dds%0f%17%99%ceQ%83%a0%3e%27&info_hash=%df%f0%1c%5e%d75%b2%7d%e6D%0d%3e%d8%fbZ%5c%de%2ae%93&https://www.midamericabank.com/my_acccounts/default.aspxL0PWSjXev6xlkMTqVKFbLUgrh8CBquCchip4PuQDWYLYpzDGOFkLZyY
====
What we get back in the "accesshost" field (i.e. the domain name) is

====
'[EMAIL 
PROTECTED]&xUDysvTbzZZOaymjQ2oYIx2AvMdJ1WQfjP02wIBBQBb1EVZAqmmGunxrcyGx1AcfegWUUYtaZfRW434O5Qn6InSMUZXgF5e3KzJbCntBGOj7pv31zab&action=login-run&passkey=e84239c9da59dbeb61d4d45db2cc5840&info_hash=%c9q%be%fe%c6j%ca%fd0%18%fe%23J%bd%89%d3%06L%fdV&info_hash=%18%9d%fb%15v%c0A%1f%c8%dds%0f%17%99%ceQ%83%a0%3e%27&info_hash=%df%f0%1c%5e%d75%b2%7d%e6D%0d%3e%d8%fbZ%5c%de%2ae%93&https:'
====

which is wrong.  Something far out in that URL is breaking urlparse, and it's 
not able to extract the domain name properly.

It's not a UNICODE issue; forced the data to "str" and it still mis-parses.

I'm trying to construct s shorter string that fails.  More to follow.

(Yes, another error associated with the wonderful world of parsing hostile 
sites 
in Python.  This is from a phishing attack, and that URL is in PhishTank.)

                                        John Nagle
                                        SiteTruth
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to