[issue26686] email.parser stops parsing headers too soon when given a defective message.

2016-08-10 Thread Martin Panter

Martin Panter added the comment:

Thanks David. Since I am more intersted in fixing this robustly for HTTP and 
similar protocols, I might focus on just Issue 24363. Either confine my changes 
to the existing HTTP (or new) policy and start using that, or just address this 
from the HTTP package.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon when given a defective message.

2016-08-10 Thread R. David Murray

R. David Murray added the comment:

I would prefer if we did lookahead to see if the subsequent line looks like a 
header.  It's more complicated to do that, of course, and could still lead to 
false negatives.  However, I think that would probably retain enough backward 
compatibility to be acceptable.  It would also be sensible to make this a 
policy switch, and as I said elsewhere I'm fine with changing the defaults of 
the http policy even in 3.5.  (The downside of *that* is that I'm sure there 
are bugs hiding in the new header parsing code, so actually using the http 
policy to parse http headers will doubtless "allow" us to find some of them.)

Even more complicated, but a better heuristic: look ahead to the next blank 
line, up to some limit (5 lines?), and if you do find something that looks like 
a header, also make sure that none of the intermediate lines look like a MIME 
boundary.   That still leaves the question of what to do with a source text 
that has non-header lines up to the next blank line (this applies to one line 
lookahead as well).  Maybe see if there is more text after the blank line and 
if so assume the non-header is part of the header, otherwise not?

Regardless, lookahead may be difficult to code.  So an alternative that uses 
your approach, but triggered by a policy setting on http, would be acceptable 
backward compatibility wise.  If we want to we could even make an internal http 
policy that is compat32 plus this new flag.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon when given a defective message.

2016-08-09 Thread Martin Panter

Martin Panter added the comment:

.
It would be nice to get feedback if my patch is sensible, especially from 
people familiar with normal usage of the “email” module, as opposed with the 
usage by the HTTP module.

--
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon when given a defective message.

2016-06-29 Thread ppperry

Changes by ppperry :


--
title: email.parser stops parsing headers too soon. -> email.parser stops 
parsing headers too soon when given a defective message.

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon.

2016-06-13 Thread Martin Panter

Martin Panter added the comment:

FWIW in the HTTP bug , David said 
“when seeing a line that doesn't look like a header the error recovery is to 
treat that line as the beginning of the body (ie: assume the blank line is 
missing).” I have no experience with email and RFC 5322 header handling, but it 
does make more sense to me to handle this as a defect in the header section, 
_not_ a genuine transition to the body (same as desired for the HTTP case).

Here is a patch that revives MalformedHeaderDefect (see Issue 14925), and 
continues parsing the rest of the header section instead of starting the body. 
But I am not sure how safe this change is. I did have to fix one unrelated set 
of tests (see headertest_msg in the Test8BitBytesHandling class) that did not 
include a blank line and was relying on the old behaviour.

--
keywords: +patch
versions: +Python 3.6
Added file: http://bugs.python.org/file43374/continue-header.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon.

2016-04-01 Thread Martin Panter

Martin Panter added the comment:

Also see Issue 24363, basically the same bug in the HTTP parser, which 
(ab?)uses the email package to do most of the work. In that case, according to 
my note the faulty header field ends with:

X-Frame-Options: SAMEORIGIN\r\n
Set-Cookie: mb-CookieP=; HttpOnly; \r\n
Secure\r\n
Set-Cookie: mb-CookieP=; HttpOnly; Secure\r\n
\r\n

But in this case, perhaps because of the implications of dropping the “Secure” 
flag, people are asking that the faulty line be appended to the previous header 
field. IMO I don’t think that is super important though. An alternative would 
be to add it to the defect list, and then raise an exception or warning if any 
defects are detected.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon.

2016-04-01 Thread SilentGhost

Changes by SilentGhost :


--
components: +email
nosy: +barry, r.david.murray
versions: +Python 3.5 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon.

2016-04-01 Thread Mark Sapiro

Mark Sapiro added the comment:

Added Python 2.7 to versions:

--
versions: +Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon.

2016-04-01 Thread Mark Sapiro

New submission from Mark Sapiro:

Given an admittedly defective (the folded Content-Type: isn't indented) message 
part with the following headers/body

---
Content-Disposition: inline; filename="04EBD_._A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
name="04EBD_._A546BB.zip"
Content-Transfer-Encoding: base64

UmFyIRoHAM+QcwAADQBKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...
---

email.parser parses the headers as

---
Content-Disposition: inline; filename="04EBD_._A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
---

and the body as

---
name="04EBD_._A546BB.zip"
Content-Transfer-Encoding: base64

UmFyIRoHAM+QcwAADQBKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...
---

and shows no defects.

This is wrong. RFC5322 section 2.1 is clear that everything up to the first 
empty line is headers. Even the docstring in the email/parser.py module says 
"The header block is terminated either by the end of the string or by a blank 
line."

Since the message is defective, it isn't clear what the correct result should 
be, but I think

Headers:
Content-Disposition: inline; filename="04EBD_._A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
Content-Transfer-Encoding: base64

Body:
UmFyIRoHAM+QcwAADQBKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...

Defects:
name="04EBD_._A546BB.zip"

would be more appropriate. The problem is that the Content-Transfer-Encoding: 
base64 header is not in the headers so that get_payload(decode=True) doesn't 
decode the base64 encoded body making malware recognition difficult.

--
components: Library (Lib)
messages: 262750
nosy: msapiro
priority: normal
severity: normal
status: open
title: email.parser stops parsing headers too soon.
type: behavior
versions: Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com