[issue1637] urlparse.urlparse misparses URLs with query but no path

2008-01-05 Thread vila

Changes by vila:


--
nosy: +vila

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1637
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1637] urlparse.urlparse misparses URLs with query but no path

2008-01-05 Thread Guido van Rossum

Guido van Rossum added the comment:

Backport to 2.5.2:
Committed revision 59760.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1637
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1637] urlparse.urlparse misparses URLs with query but no path

2007-12-19 Thread John Nagle

John Nagle added the comment:

I tried downloading the latest rev of urlparse.py (59480) and it flunked
its own unit test, urlparse.test()  Two test cases fail. So I don't
want to try to fix the module until the last people to change it fix
their unit test problems. 
The fix I provided should fix the problem I reported, but I'm not sure
if there's anything else wrong, since it flunks its unit test.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1637
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1637] urlparse.urlparse misparses URLs with query but no path

2007-12-19 Thread Guido van Rossum

Guido van Rossum added the comment:

 I tried downloading the latest rev of urlparse.py (59480) and it flunked
 its own unit test, urlparse.test()  Two test cases fail.

That's not the official test -- that code should probably be deleted. 
The real test is in Lib/test/test_urlparse.py.  Please ignore that test.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1637
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1637] urlparse.urlparse misparses URLs with query but no path

2007-12-18 Thread Guido van Rossum

Guido van Rossum added the comment:

Would you mind submitting a proper patch for Python 2.5 and/or 2.6
generated by svn diff relative to an (anonymous) checkout, and adding
a unit test?  Then I'd be happy to accept this and if it makes it in
time for the 2.5.2 release we'll fix it there.

--
nosy: +gvanrossum
priority:  - normal

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1637
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1637] urlparse.urlparse misparses URLs with query but no path

2007-12-16 Thread John Nagle

New submission from John Nagle:

urlparse.urlparse will mis-parse URLs which have a / after a ?.

 sa1 = 'http://example.com?blahblah=/foo'
 sa2 = 'http://example.com?blahblah=foo'
 print urlparse.urlparse(sa1)
 ('http', 'example.com?blahblah=', '/foo', '', '', '') # WRONG
 print urlparse.urlparse(sa2)
 ('http', 'example.com', '', '', 'blahblah=foo', '') # RIGHT

That's wrong. RFC3896 (Uniform Resource Identifier (URI): Generic
Syntax), page 23 says

The characters slash (/) and question mark (?) may represent data
within the query component.  Beware that some older, erroneous
implementations may not handle such data correctly when it is used as
the base URI for relative references (Section 5.1), apparently
because they fail to distinguish query data from path data when
looking for hierarchical separators.

 So urlparse is an older, erroneous implementation.  Looking
 at the code for urlparse, it references RFC1808 (1995), which
 was a long time ago, three revisions back.

 Here's the bad code:

 def _splitnetloc(url, start=0):
 for c in '/?#': # the order is important!
 delim = url.find(c, start)
 if delim = 0:
 break
 else:
 delim = len(url)
 return url[start:delim], url[delim:]

 That's just wrong.  The domain ends at the first appearance of
 any character in '/?#', but that code returns the text before the
 first '/' even if there's an earlier '?'.  A URL/URI doesn't
 have to have a path, even when it has query parameters. 

OK, here's a fix to urlparse, replacing _splitnetloc.  I didn't use
a regular expression because urlparse doesn't import re, and I
didn't want to change that.

def _splitnetloc(url, start=0):
delim = len(url)# position of end of domain part of url, default is end
for c in '/?#':# look for delimiters; the order is NOT important   
wdelim = url.find(c, start)# find first of this delim
if wdelim = 0:# if found
delim = min(delim, wdelim)# use earliest delim position
return url[start:delim], url[delim:]# return (domain, rest)

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1637
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com