[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2015-03-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Trailing spaces, newlines and like were disabled in put_header() in issue22928. 
Now we can start long-standing deprecation process for headers that don't 
conform RFC 7230.

--
nosy: +serhiy.storchaka
stage: commit review - needs patch
versions: +Python 3.5 -Python 2.7, Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2015-02-22 Thread Martin Panter

Martin Panter added the comment:

See Issue 22928 for a patch making the “http.client” module even more stricter 
with header field names and values.

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2014-09-23 Thread karl

karl added the comment:

Just a follow up for giving the stable version of the now new RFC version for 
HTTP 1.1

HTTP header field names parsing
http://tools.ietf.org/html/rfc7230#section-3.2.4

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-06 Thread Piotr Dobrogost

Changes by Piotr Dobrogost p...@bugs.python.dobrogost.net:


--
nosy: +piotr.dobrogost

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-04 Thread R. David Murray

R. David Murray added the comment:

A crazy idea that occurred to me was to create an rfc822-style-header 
management module, and share it between email, http, and urllib.  We'd 
probably break too many things backward-compatibility wise if we did that, but 
I still think it is an interesting thought :)

On the other hand, having thought about this particular issue some more: in the 
email package we have the constraint of needing to be able to exactly reproduce 
the input data, whereas in the http world that constraint does not apply.  So 
in the http world, given that headers are *already* being transformed by the 
code in question (title casing), it seems reasonable that blank stripping also 
be done, even just from an API standpoint.

Really I guess the only question is how likely this is to break existing code.  
I'm pretty sure it is small enough that doing this in 3.4 would be fine, but I 
don't know how to estimate if it is small enough to also change it in 
maintenance releases.  Since this particular bit is a new standard, maybe we 
just go with 3.4?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-04 Thread karl

karl added the comment:

R. David.:
 A crazy idea that occurred to me was to create an rfc822-style-header 
 management module, and share it between email, http, and urllib.

Yes it is basically what I had in mind when I said:

Maybe the way forward in the future is to have a header factory shared by all 
HTTP libs?

I'm not sure if it's easy to share in between emails and HTTP. Emails are now 
defined by RFC5322:
https://tools.ietf.org/html/rfc5322#section-2.2

2.2. Header Fields


   Header fields are lines beginning with a field name, followed by a
   colon (:), followed by a field body, and terminated by CRLF.  A
   field name MUST be composed of printable US-ASCII characters (i.e.,
   characters that have values between 33 and 126, inclusive), except
   colon.  A field body may be composed of printable US-ASCII characters
   as well as the space (SP, ASCII value 32) and horizontal tab (HTAB,
   ASCII value 9) characters (together known as the white space
   characters, WSP).  A field body MUST NOT include CR and LF except
   when used in folding and unfolding, as described in section
   2.2.3.  All field bodies MUST conform to the syntax described in
   sections 3 and 4 of this specification.

Maybe it's doable and worth exploring but I have the feeling we would end up 
with something along

field-name : field-value

and all the rules for field-name and field-value being different for HTTP and 
email, because they really are. Folding, set of characters, etc. :)

Another thing which should be also the opportunity for opening another issue. 
HTTP headers and python dictionaries is not a very good match. :) But this is 
drifting off-topic. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-04 Thread R. David Murray

R. David Murray added the comment:

Aren't the folding rules are the same?  The character set rules are different, 
I think, but the email package is going to be flexible in that regard.

The email package also uses a data structure that is not a python dictionary 
(it is actually a list with an extra dict-like interface), and its features may 
well be useful for handling http headers.

But you are right, we are wandering off topic :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread R. David Murray

R. David Murray added the comment:

Given that this is an RFC violation it looks appropriate as a bug fix for all 
active versions to me.  The patch looks good, though I might decide to break up 
the test.

--
nosy: +r.david.murray
stage:  - commit review
type:  - behavior
versions: +Python 2.7, Python 3.2, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread R. David Murray

R. David Murray added the comment:

Ah, but that's a draft and not approved.  Hmm.  There is a possibility this 
change would break code, if the code tries to look up a header by the same key 
it used to set it.  On the other hand, the key use for the set is already 
modified by the existing code...difficult call.

Having looked in more detail at the test, I think it actually belongs in 
test_urllib2.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread R. David Murray

R. David Murray added the comment:

Here is a modified patch with the tests moved to test_urllib2.  I'll give 
people some time to comment on whether this should be applied at all, and if so 
if it should be backported.  I'm leaning toward doing both, at the moment.

Karl, thanks for the report and patch.  The urllib tests could use a bit of 
reorganization to make them more discoverable, but I don't know that that is 
very likely to happen :)

The case of doing an add_header with newlines in it is probably worth a 
separate issue.  We fixed a similar issue in the email package a while back, 
but that one was much more likely to be an issue since it is much more likely 
for a program to be adding headers to an email message based on user input than 
it is to be adding headers to a Request based on user input.  But I'm sure it 
does happen, so it is probably worth fixing.

--
Added file: http://bugs.python.org/file29297/request_header_space_removal.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread Senthil Kumaran

Senthil Kumaran added the comment:

David  Karl - 

I had been thinking on this. My understanding of the RFC implies that server 
should reject when the headers contain the whitespace and I had a little 
concern if the client library should feel free to cleanup a wrongly set 
headers? Would it be a good idea to see what curl is doing?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread R. David Murray

R. David Murray added the comment:

Good point.  Seeing what curl does sounds like a good idea.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread Senthil Kumaran

Senthil Kumaran added the comment:

Looks like curl is sending the headers without removing spaces. 
Check Raw here link here. (The link would probably be online a week)

http://requestb.in/1kfodmj1?inspect

$ curl --header X-MyHeader: 123 http://requestb.in/1kfodmj1
ok
$ curl --header   X-MyHeader:123http://requestb.in/1kfodmj1
ok
$ curl --header   X-MyHeader\nY-Header:123 
http://requestb.in/1kfodmj1
ok

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread karl

karl added the comment:

Hello,

So I tested a bit. The production rules defined by the specification are clear. 
Spaces before and after are forbidden. 

header-field   = field-name : OWS field-value BWS
field-name = token
field-value= *( field-content / obs-fold )
field-content  = *( HTAB / SP / VCHAR / obs-text )
obs-fold   = CRLF ( SP / HTAB )
   ; obsolete line folding
   ; see Section 3.2.4

and 

  token = 1*tchar

and tchar as 

  tchar = ! / # / $ / % /  / ' / * / + / - / . /
   ^ / _ / ` / | / ~ / DIGIT / ALPHA

Here are the production rules for HTTP headers for messages (so both Request 
and Responses). 

You can have funky headers, I guess that would be interesting to add to the 
urllib tests too. Basically to have something in the library, which check if 
header contains the tchar characters and sends back a warning of exception when 
not part of it.

curl has a bug too, IMHO. Though, one might argue that it is practical for 
testing bugs. :)

On the side of parsing it's clear for the trailing space but unknown for the 
leading spaces. I sent a long email explaining the issue to the HTTP WG.

See http://lists.w3.org/Archives/Public/ietf-http-wg/2013JanMar/1166.html

Let's see what will be the answers

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread Senthil Kumaran

Senthil Kumaran added the comment:

Oh wow. Thank you very much Karl for the care. I am having the same
inclination are you too, but determining a definite answer is really
helpful before going ahead into making the change.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread Senthil Kumaran

Senthil Kumaran added the comment:

The curl example also suggest to think about pragamatic de-facto
stuff. Will removing the spaces cause any breakage? I can say for
sure. But if someone can think of it, it would be good for at least us
know.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread karl

karl added the comment:

OK. I'm inclined to think that we should both remove trailing and leading 
spaces/tabs should be removed.

Reasons: 

1. Production rules forbid them.
2. Trailing spaces 
   2.a Conformant servers will ignore with a 400 bad request (opportunity for 
another bugs?)
   2.b Conformant proxies will rewrite the header by removing spaces.
3. Leading spaces are continuation lines. See below.

I had completely missed that. The syntax for headers is:

header-field   = field-name : OWS field-value BWS
field-name = token
field-value= *( field-content / obs-fold )
field-content  = *( HTAB / SP / VCHAR / obs-text )
obs-fold   = CRLF ( SP / HTAB )
   ; obsolete line folding
   ; see Section 3.2.4

obs-fold is about line folding which is basically a header that would look like:

foo: bar\n
 blah: something

which could be the equivalent of:

foo: bar blah: something

with foo the header field-name and bar blah: something the header 
field-value.

which is clearly not the intent of an author doing:

request.add_header('Accept', 'text/html')
request.add_header(' User-Agent', 'foobar/1.0')

because this would be parsed by a conformant server as

Accept: text/html User-Agent: foobar/1.0

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread R. David Murray

R. David Murray added the comment:

It is a bug in the program, though, and not particularly in the client library. 
 As mentioned, it can even be useful for testing servers.

In the email package we faithfully reproduce such headers if they are passed 
in.  The only one we disallow is something that looks like a field label 
preceded by a newline, since that could be used for a header injection attack 
if the field value comes from user input.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-03 Thread karl

karl added the comment:

R. David Murray,

You are right it is not specific to the client library. HTTP headers are part 
of the message (Request/Response) with both the same constraints. Constraints 
are put on receivers (receiving a message) and senders (sending a message) of 
the message (which is not specifically related to client or server).

Maybe the way forward in the future is to have a header factory shared by all 
HTTP libs? I noticed that http.client and http.server had similar issues: 

in http.server
   send_header
in http.client
   putheader

Which are similar features aka constructing headers for sending with the 
message. 


And what would be the elegant way to solve this current bug?

Ah… before I forget… The WG is having a meeting in 2 weeks. To make a summary 
of the HTTPBIS work. See the agenda. 
http://tools.ietf.org/wg/httpbis/agenda?item=agenda-86-httpbis.html

The current documents are in Last Call with no issues unresolved.
http://trac.tools.ietf.org/wg/httpbis/trac/report/20

So if R. David is worried that it will change, we can wait a bit more before 
taking actions, if we are going the way of removing leading/trailing spaces.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-02 Thread karl

karl added the comment:

http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l359

def add_header(self, key, val):
# useful for something like authentication
self.headers[key.capitalize()] = val

and http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l271
in __init__ of class Request:

for key, value in headers.items():
self.add_header(key, value)

Tests seem to be there, but there are none. Is there a reason why there are no 
tests?
http://hg.python.org/cpython/file/3.3/Lib/test/test_urllib2.py#l98

Or should I add a test in
http://hg.python.org/cpython/file/3.3/Lib/test/test_urllib.py

I'm confused :) 
I need guidance.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-03-02 Thread karl

karl added the comment:

I created 4 tests for testing trailing and leading spaces on

* add_unredirected_header()
* add_header()

and modified the functions.

Tests passed.

→ ./python.exe Lib/test/test_urllib2net.py 
[…]

test_headers_with_spaces (__main__.OtherNetworkTests) ... ok

[…]
--
Ran 16 tests in 17.148s

OK (skipped=1)
[137988 refs]

See the patch issue-17322-1.patch

--
keywords: +patch
Added file: http://bugs.python.org/file29292/issue-17322-1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-02-28 Thread karl

Changes by karl karl+pythonb...@la-grange.net:


--
title: urllib.request add_header() currently allows trailing spaces - 
urllib.request add_header() currently allows trailing spaces (and other weird 
stuff)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)

2013-02-28 Thread karl

karl added the comment:

Yet another one leading spaces :(

 req = urllib.request.Request('http://www.example.com/')
 req.header_items()
[]
 req.add_header(' Foo3', 'Ooops')
 req.header_items()
[(' foo3', 'Ooops')]
 req.headers
{' foo3': 'Ooops'}

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17322
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com