[issue24363] httplib fails to handle semivalid HTTP headers

2019-09-16 Thread Christian Schmidbauer
Change by Christian Schmidbauer : -- keywords: +patch pull_requests: +15787 pull_request: https://github.com/python/cpython/pull/12214 ___ Python tracker ___

[issue24363] httplib fails to handle semivalid HTTP headers

2019-09-14 Thread Abhilash Raj
Abhilash Raj added the comment: Martin: Can you please create a PR for the added patch? If you are busy, I can do that for you, just wanted to ask before I do :) I am going to remove "easy" label from this issue, which IMO it clearly isn't given 4 years of history to catch up on and a few

[issue24363] httplib fails to handle semivalid HTTP headers

2017-02-07 Thread R. David Murray
R. David Murray added the comment: Yeah, I'm going to try to get to this this weekend. -- ___ Python tracker ___

[issue24363] httplib fails to handle semivalid HTTP headers

2017-02-07 Thread Guillaume Boudreau
Guillaume Boudreau added the comment: Any chance this could get reviewed and merged soon? I got hit by a similar issue (see #29445) where the server, which I don't control, sends me invalid HTTP headers, and the prevents all the headers that follow it to not be parsed. The latest attached

[issue24363] httplib fails to handle semivalid HTTP headers

2017-01-23 Thread Martin Panter
Martin Panter added the comment: Just a minor update with an extra get_payload() test I missed before -- versions: +Python 3.7 Added file: http://bugs.python.org/file46400/policy-flag.v2.patch ___ Python tracker

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-18 Thread R. David Murray
R. David Murray added the comment: I will try to review this in the not too distant future. You can ping me if I don't get to it by next Saturday. I think I'll probably prefer to call the flag something like _greedy_header_parsing, to reflect a change from assuming we've got body on a

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-15 Thread Martin Panter
Martin Panter added the comment: I pushed my Py 2 patch, since it is simpler and does not interfere with other modules. But it would still be good to get feedback on policy-flag.patch for Python 3. -- ___ Python tracker

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-15 Thread Roundup Robot
Roundup Robot added the comment: New changeset c38e10ad7c89 by Martin Panter in branch '2.7': Issue #24363: Continue parsing HTTP header in spite of invalid lines https://hg.python.org/cpython/rev/c38e10ad7c89 -- nosy: +python-dev ___ Python tracker

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-10 Thread Martin Panter
Martin Panter added the comment: Patch for Python 2 -- Added file: http://bugs.python.org/file44527/header.py2.patch ___ Python tracker ___

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-10 Thread Martin Panter
Martin Panter added the comment: Here is a fix using a policy flag. I called it “_py_body_detached”. -- Added file: http://bugs.python.org/file44524/policy-flag.patch ___ Python tracker

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-09 Thread R. David Murray
R. David Murray added the comment: Oh, yes, I forgot 2.7 was using the older code. Sure, that flag name sounds fine. I'm not worrying about policy flag name collisions, but perhaps I should be. How about _py_strict_end_of_headers? I don't really care what it is named for the bug fix, I'll

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-09 Thread Martin Panter
Martin Panter added the comment: I guess we could add this secret policy flag that the email parser checks. The solution should still be applied as a bug fix to 3.5 as well as 3.6+. I would have to make the flag “very” unique, to reduce the chance of it breaking user code. I.e. adding

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-09 Thread R. David Murray
R. David Murray added the comment: Hmm. Or maybe the latin-1 decode and FeedParser is better, since with BytesFeedParser and non-ascii you get Header objects, which you don't want. Either way, there's no TextIOWrapper involved. -- ___ Python

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-09 Thread R. David Murray
R. David Murray added the comment: I'm not sure it is a good idea to backport this to 2.7, but if you want to, the same fix can be made in a more hackish way on 2.7 by putting a private variable on FeedParser to control the header parsing behavior. --

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-09 Thread Senthil Kumaran
Changes by Senthil Kumaran : -- nosy: +orsenthil ___ Python tracker ___ ___

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-09 Thread R. David Murray
R. David Murray added the comment: I think you can greatly simplify this code by using BytesFeedParser and feeding the input to it line by line. You don't need to gather the headers first, since you control how much you read and how much you send to the parser. The change to email is then

[issue24363] httplib fails to handle semivalid HTTP headers

2016-09-08 Thread Martin Panter
Martin Panter added the comment: Updated patch for Python 3 now that Issue 22233 has been fixed. -- Added file: http://bugs.python.org/file44482/bypass-parsegen.v2.patch ___ Python tracker

[issue24363] httplib fails to handle semivalid HTTP headers

2016-08-12 Thread Martin Panter
Martin Panter added the comment: In order to avoid messing too much with the intricacies of the existing email parsing, here is a patch for Python 3 that limits the behaviour changes to the HTTP module. It should fix the bad handling of broken header lines. As a side effect, it should also

[issue24363] httplib fails to handle semivalid HTTP headers

2016-06-14 Thread Martin Panter
Martin Panter added the comment: I made a patch to fix all header section parsing by default in the email module (see Issue 26686). I’m not 100% sure if it is safe in general, but if it is, it would fix this bug. -- dependencies: +email.parser stops parsing headers too soon.

[issue24363] httplib fails to handle semivalid HTTP headers

2016-06-12 Thread Martin Panter
Martin Panter added the comment: See also Issue 26686; the same problem, but with parsing RFC5322 header fields, rather than HTTP. -- ___ Python tracker

[issue24363] httplib fails to handle semivalid HTTP headers

2016-06-11 Thread Martin Panter
Changes by Martin Panter : -- versions: +Python 3.6 -Python 3.4 ___ Python tracker ___

[issue24363] httplib fails to handle semivalid HTTP headers

2015-11-28 Thread Martin Panter
Martin Panter added the comment: Since the Python 2 and Python 3 branches are different, two different patches would be needed here. Perhaps they could share common test cases though. Michael: I presume your proposal is for Python 2. I don’t understand the re.findall() expression; is there a

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-03 Thread Cory Benfield
Cory Benfield added the comment: It is obvious that this case could be treated as a folded (continuation) line. But in general I think it would be better to ignore the erroneous line, or to record it as a defect so that the server module or other user can check it. Just to clarify, in an

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-03 Thread Cory Benfield
Cory Benfield added the comment: While we're here and I'm recommending to drop as little data as possible: we need to be really careful about not exposing ourselves to any kind of data smuggling attack here. It's really important that we don't let attackers construct bodies of requests or

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-03 Thread Michael Del Monte
Michael Del Monte added the comment: Given that obs-fold is technically valid, then can I recommend reading the entire header first (reading to the first blank line) and then tokenizing the individual headers using a regular expression rather than line by line? That would solve the problem

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-03 Thread Michael Del Monte
Michael Del Monte added the comment: ... or perhaps if ':' in line and line[0] != ':': to avoid the colon-as-first-char bug that plagued this library earlier, though the only ill-effect of leaving it alone would be a header with a blank key; not the end of the world. --

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Michael Del Monte
Michael Del Monte added the comment: I don't want to speak out of school and you guys certainly know what you're doing, but it seems a shame to go through these gyrations -- lookahead plus unreading lines -- only to preserve the ability to parse email headers, when HTTP really does follow a

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread R. David Murray
R. David Murray added the comment: No, the point is to do best practical error recovery when faced with dirty data that may be dirty in various ways, and it doesn't really matter whether it is http headers or email headers. A line with leading whitespace is treated as part of the preceding

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Martin Panter
Martin Panter added the comment: Regarding the suggested fix for Python 2, make sure it does not prematurely end the parsing on empty folded lines (having only tabs and spaces in them). E.g. according to RFC 7230 this should be a single header field: bHeader: obsolete but\r\n b\r\n b

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread R. David Murray
R. David Murray added the comment: I think there may be a way to accomplish this in a reasonably straightforward fashion in python3 given that feedparser has an 'unreadline' function. The python2 case is probably going to be a more complicated change. And I agree that multiple lines should

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Cory Benfield
Cory Benfield added the comment: This is one of those bugs that's actually super tricky to correctly fix. The correct path is to have the goal of conservatively accepting as many headers as possible. Probably this means looking ahead to the next few lines and seeing if they appear to roughly

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Demian Brecht
Changes by Demian Brecht demianbre...@gmail.com: -- nosy: +demian.brecht ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24363 ___ ___

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread R. David Murray
R. David Murray added the comment: Since the email package has the correct logic for handling the blank continuation line case (even in Python2) (because, again, that derives from the original email standard), it might be reasonable to use feedparser's headersonly mode. If necessary we can

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Ian Cordasco
Ian Cordasco added the comment: Also I'm marking this as affecting 3.3, 3.4, and 3.5. I haven't tested against 3.5, but it definitely fails on 3.4. I hope to be able to test against 3.5.0b2 tonight -- versions: +Python 3.3, Python 3.4, Python 3.5

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Ian Cordasco
Ian Cordasco added the comment: FWIW, the proper section to reference now is 3.2 in RFC 7230 (https://tools.ietf.org/html/rfc7230#section-3.2) -- nosy: +icordasc ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24363

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Michael Del Monte
New submission from Michael Del Monte: Initially reported at https://github.com/kennethreitz/requests/issues/2622 Closely related to http://bugs.python.org/issue19996 An HTTP response with an invalid header line that contains non-blank characters but *no* colon (contrast

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread R. David Murray
R. David Murray added the comment: The current behavior probably comes out of the RFC822 world, where when seeing a line that doesn't look like a header the error recovery is to treat that line as the beginning of the body (ie: assume the blank line is missing). Is there in fact any guidance

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread Michael Del Monte
Michael Del Monte added the comment: Thanks. Also I meant to have said, ...to terminate only on a *blank* non-header non-comment line, in accordance with RFC 2616 (and 7230). I note that the RFCs require CRLF to terminate but in my experience you can get all manner of blank lines, so

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: -- components: +email keywords: +easy nosy: +barry stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24363 ___

[issue24363] httplib fails to handle semivalid HTTP headers

2015-06-02 Thread R. David Murray
R. David Murray added the comment: Ah, in fact that's exactly where it comes from, since httplib uses the email header parsing code. In python3 we are actually using the email package to parse the headers (which is sensible) (in 2.7 it is a copy of code from the old mimelib with some