[issue20271] urllib.parse.urlparse() accepts wrong URLs

2022-01-17 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2022-01-16 Thread Irit Katriel


Change by Irit Katriel :


--
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 3.5, Python 3.6, Python 
3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2019-10-24 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

With Victor's PR 16780, "http://[::1]spam:80; would raise a ValueError.

ValueError: Invalid network location: '[::1]spam:80'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2019-03-27 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

See also issue36338 for a possible security issue for host of value 
"benign.com[attacker.com]" (spam[::1] format) where attacker.com is parsed as 
the host name assuming presence of [ and ] to be a IPV6 address without 
validation of the value attacker.com inside [] to be a valid IPV6 address.

As a datapoint input "http://[::1]spam; raises exception in Java, golang and 
Ruby. Browser's JS console returns invalid URL. I too would like exception 
being raised but not sure at which level.

Ruby seems to use a regex : 
https://github.com/ruby/ruby/blob/trunk/lib/uri/rfc3986_parser.rb#L6
Java parseurl : 
http://hg.openjdk.java.net/jdk/jdk/file/c4c225b49c5f/src/java.base/share/classes/java/net/URLStreamHandler.java#l124
golang : 
https://github.com/golang/go/blob/50bd1c4d4eb4fac8ddeb5f063c099daccfb71b26/src/net/url/url.go#L587

See also https://url.spec.whatwg.org/#host-parsing

If input starts with U+005B ([), then:

If input does not end with U+005D (]), validation error, return failure.

Return the result of IPv6 parsing input with its leading U+005B ([) and 
trailing U+005D (]) removed.

--
nosy: +xtreak
versions: +Python 3.6, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2015-03-22 Thread Martin Panter

Martin Panter added the comment:

To me, the only difference between urlsplit() and urlparse() is that urlsplit() 
does not treat parameters following a semicolon specially. The documentation 
tends to agree with this view. So whatever exceptions are raised by urlparse() 
should also be raised by urlsplit() for consistency.

Also, I doubt it is worth worrying about the finer details of splitport() or 
whatever, because they are meant to be deprecated and not used externally, 
except for backwards compatibility.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2015-03-06 Thread Demian Brecht

Demian Brecht added the comment:

 splitport() shouldn't handle auth, it should be called after auth is dropped 
 with splituser().

Why shouldn't splitport() handle auth? Are you suggesting that in the eyes of 
splitport() that user:password@host:port should be invalid?

 parse.splitport('user:password@host:80')
('user:password@host', '80')

Seems to be reasonable behaviour to me. Why add the artificial constraint of 
calling splituser() first?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2015-03-06 Thread Demian Brecht

Demian Brecht added the comment:

I think some further consideration around this change is worthwhile:

Currently, urllib.parse.split* methods seem to do inconsistent validation 
around the data they're splitting. For example:

(None for an invalid port)
 parse.splitnport('example.com:foo')
('example.com', None)

Whereas other split* methods do not:

(Auth part should be URL-encoded)
 parse.splituser('u@ser:p@ssw...@example.com:80')
('u@ser:p@ssword', 'example.com:80')

And others are just plain incorrect:

(:bad should be the port as defined by the ABNF 'authority = [ userinfo @ ] 
host [ : port ]')
 parse.splitport('example.com:bad')
('example.com:bad', None)

However, none of these cases (currently) raise exceptions.

Looking at urllib.parse, two large motivations behind it are splitting and 
parsing. In my mind, splitting should solely be responsible for splitting input 
based on RFC-defined delimiters. Parsing on the other hand, should be 
responsible for both splitting as necessary as well as input validation. It may 
also make sense to add simple validation functions to the module to comply with 
the batteries included philosophy, but that's a topic for another issue.

My concern with the proposed patch is that it adds further inconsistency to 
split* methods:

Before patch:

 parse.urlsplit('http://[::1]spam:80')
SplitResult(scheme='http', netloc='[::1]spam:80', path='', query='', 
fragment='')

After patch:

 parse.urlsplit('http://[::1]spam:80')
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Volumes/src/p/cpython/Lib/urllib/parse.py, line 350, in urlsplit
netloc, url = _splitnetloc(url, 2)
  File /Volumes/src/p/cpython/Lib/urllib/parse.py, line 324, in _splitnetloc
raise ValueError('Invalid IPv6 URL')

(While the above examples still yield the same results and don't raise 
exceptions)

I do think that the validation should be done and I agree that an exception 
should be raised, but it should be done at the urlparse level, not on split (in 
this case, due to the change to _splitnecloc).

--
nosy: +demian.brecht

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2015-03-06 Thread Demian Brecht

Changes by Demian Brecht demianbre...@gmail.com:


--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2015-03-03 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

With optimizations in issue23563 and weaken IPv6 check the implementation of 
urlsplit() can be faster.

$ ./python -m timeit -s from urllib.parse import urlparse, clear_cache -- 
urlparse('http://python.org:80'); clear_cache()
1 loops, best of 3: 86.3 usec per loop
$ ./python -m timeit -s from urllib.parse import urlparse, clear_cache -- 
urlparse('http://[2001:4802:7901:0:e60a:1375:0:5]:80'); clear_cache()
1 loops, best of 3: 88.6 usec per loop

--
Added file: http://bugs.python.org/file38322/urlparse_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2015-03-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

splitport() shouldn't handle auth, it should be called after auth is dropped 
with splituser().

The patch makes URL parsing slower, especially URLs with IPv6 addresses.

$ ./python -m timeit -s from urllib.parse import urlparse, clear_cache -- 
urlparse('http://python.org:80'); clear_cache()
Unpatched: 1 loops, best of 3: 70.3 usec per loop
Patched:   1 loops, best of 3: 105 usec per loop

$ ./python -m timeit -s from urllib.parse import urlparse, clear_cache -- 
urlparse('http://[2001:4802:7901:0:e60a:1375:0:5]:80'); clear_cache()
Unpatched: 1 loops, best of 3: 71.1 usec per loop
Patched:   1000 loops, best of 3: 239 usec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2015-03-01 Thread Martin Panter

Martin Panter added the comment:

I suspect the proposed patch here would make some of the special error handling 
for the “port” property redundant, and would compete with my patch for Issue 
20059, which raises ValueError in more cases when “port” is accessed, rather 
than returning None.

There should also be test cases (and maybe documentation or fixes?) to 
illustrate the behaviour of parsing:

* http://[fe80::a%25en1] (RFC 6874-style scoped IPv6 address with zone 
identifier)
* http://[v1.fe80::a+en1] (RFC 3986 IPvFuture format, using 
https://tools.ietf.org/html/draft-sweet-uri-zoneid-01 style address)

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2014-03-07 Thread STINNER Victor

STINNER Victor added the comment:

Oh, by the way my patch fixes also #18191.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2014-03-07 Thread STINNER Victor

STINNER Victor added the comment:

Here is a patch for Python 3.5 which breaks backward compatibility: urlparse 
functions now raise a ValueError if the IPv6 address, port or host is invalid.

Examples of invalid URLs:

- HTTP://WWW.PYTHON.ORG:65536/doc/#frag: 65536 is invalid
- http://www.example.net:foo;
- http://::1/;
- http://[127.0.0.1]/;
- http://[host]/;

According to unit tests, Python 3.4 is more tolerant: it only raises an error 
when the port number is read (obj.port) from an URL with an invalid port. There 
error is not raised when the whole URL is parsed.

Is it ok to break backward compatibility?

--
keywords: +patch
nosy: +gvanrossum, haypo
Added file: http://bugs.python.org/file34300/urlparse.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2014-03-07 Thread STINNER Victor

STINNER Victor added the comment:

My patch urlparse.patch may be modified to fix also #18191 in Python 2.7, 3.3 
and 3.4: splitport() should handle IPv6 ([::1]:80) and auth 
(user:passowrd@host) but not raises an exception.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2014-01-31 Thread Yury Selivanov

Yury Selivanov added the comment:

I think we should start raising an exception in 3.5 (backwards incompatible 
change to back-port it)

--
nosy: +yselivanov
versions: +Python 3.5 -Python 2.7, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2014-01-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What should be correct behavior? Raise an exception, return '[::1]spam' as 
hostname, or left all as is?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2014-01-15 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

 import urllib.parse
 p = urllib.parse.urlparse('http://[::1]spam:80')
 p
ParseResult(scheme='http', netloc='[::1]spam:80', path='', params='', query='', 
fragment='')
 p.hostname
'::1'
 p.port
80

'http://[::1]spam:80' is invalid URL, but urllib.parse.urlparse() accepts it 
and just ignore the spam part.

--
components: Library (Lib)
messages: 208156
nosy: orsenthil, serhiy.storchaka
priority: normal
severity: normal
stage: needs patch
status: open
title: urllib.parse.urlparse() accepts wrong URLs
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20271] urllib.parse.urlparse() accepts wrong URLs

2014-01-15 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
dependencies: +urllib.parse doesn't work with empty port

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20271
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com