[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2021-12-10 Thread Irit Katriel


Irit Katriel  added the comment:

Reproduced on 3.11:

>>> urllib.parse.urlparse('http://www.google.com:/abc')
ParseResult(scheme='http', netloc='www.google.com:', path='/abc', params='', 
query='', fragment='')


The discussion above was mostly leaning towards won't fix, except for the 
suggestion to

> Based on the RFC, one could argue that urlunsplit should omit the colon.

--
nosy: +iritkatriel
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 2.7, Python 3.5, Python 
3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread R. David Murray

R. David Murray added the comment:

Yes, I'm afraid so.  But the rationale is presumably the same as it is in many 
RFCs: you should accept broken stuff if it can be interpreted unambiguously, 
but you should not *produce* the broken stuff.  So I'd say the RFC is correct 
as quoted.  It is then up to openstack whether or not it wants to accept such 
URLs in a given interface, and it if were me I'd base that decision on whether 
it was an 'internal' interface or an 'external' one.  But you can also argue 
that even an external interface could be strict, depending on the application 
domain of the interface.  urllib, on the other hand, needs to accept it, and we 
can't change it at this point for backward compatibility reasons if nothing 
else.

Based on the RFC, one could argue that urlunsplit should omit the colon.  That 
could break backward compatibility, too, though will a lot less likelyhood.  So 
we could still fix it in 3.7 if we decide we should.

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread Amrith Kumar

Amrith Kumar added the comment:

Eric, Martin,

I'm sure this is an interpretation (and I'll be fair and give you both) that a 
URL of the form:

http://hostname.domain.tld:/path

should be held invalid. The rationale is per the section of the spec you quoted.

The other side of the argument is that the hostname and port are defined as:

hostname [ : port ] 

where port is defined as

*DIGIT

This implies that 0 digits is also valid.

I submit to you that the ambiguity would be removed if they:

- removed the paragraph telling people not to emit a : if there was no port, or
- changing the port definition to 1*DIGIT

Absent that, I believe that the paragraph implies that the intent was 1*DIGIT.

And therefore a : with no following number should generate an error.

I raise this because the behavior of urlparse() (the way it does things now) is 
being cited as the reason why other routines in OpenStack should handle this 
when the reason we were getting URL's with a : and no port was in fact an 
oversight.

So, in hearing the rationalization as being that "urlparse() does it, so ..." I 
figured I'd come to the source and see if we could make urlparse() do things 
unambiguously.

Now, if the reason it does this is because someone who came before me made the 
argument that a : with no port should be accepted, I guess I'm out of luck.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread Eric V. Smith

Eric V. Smith added the comment:

And note that there are tests that explicitly check that the colon with no port 
works (via issue 20270).

Given that this behavior has been around for a while, and is explicitly 
allowed, I would recommend against changing this to an error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread Martin Panter

Martin Panter added the comment:

Also, this is in direct contradiction to Issue 20270.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread Martin Panter

Martin Panter added the comment:

The Python documentation refers to RFC 3986, which allows an empty port string 
introduced by a colon, although it recommends against it: 


The “port” subcomponent of “authority” is designated by an optional port number 
in decimal following the host and delimited from it by a single colon. . . . 
URI producers and normalizers should omit the port component and its “:” 
delimiter if “port” is empty . . .

What problem are you trying to solve by raising an error?

See also Issue 20059, where accessing the “port” field now raises ValueError 
for out-of-range values, and Issue 20271, discussing invalid URL handling more 
generally.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread Amrith Kumar

Amrith Kumar added the comment:

As requested ...

>>> urlparse.urlparse('http://www.google.com:/abc')
ParseResult(scheme='http', netloc='www.google.com:', path='/abc', params='', 
query='', fragment='')

I submit to you that this should generate an error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread Eric V. Smith

Eric V. Smith added the comment:

Can you please paste your code into a comment on this issue? Gist content has a 
habit of vanishing. Thanks!

--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread SilentGhost

Changes by SilentGhost :


--
nosy: +orsenthil
versions: +Python 3.5, Python 3.6, Python 3.7 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)

2016-11-30 Thread Amrith Kumar

New submission from Amrith Kumar:

The urlparse library incorrectly accepts a URI which specifies an invalid host 
identifier.

Example:

http://www.example.com:/abc

Looking at the URI specifications, this is an invalid URI.

By definition, you are supposed to specify a "hostport" and a hostport is 
defined as:

https://www.w3.org/Addressing/URL/uri-spec.html

hostport
host [ : port ]

The BNF indicates that : is only valid if a port is also specified.

See current behavior; I submit to you that this should generate an exception.

https://gist.github.com/anonymous/8504f160ff90649890b5a2a82f8028b0

--
components: Library (Lib)
messages: 282086
nosy: amrith
priority: normal
severity: normal
status: open
title: urlparse.urlparse() parses invalid URI without generating an error 
(examples provided)
type: behavior
versions: Python 2.7, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com