[issue25728] email parser ignores inner multipart boundary when outer message duplicates it

2015-11-25 Thread R. David Murray

R. David Murray added the comment:

I am open to (and will review) a patch that applies simple heuristics to trying 
to guess correctly about such messages, but only if it doesn't add too much 
complexity to the parser.  I'm not certain I would consider it for a bug fix 
release, but I'll postpone that decision until I review the patch (the issue 
is: would it have the potential to break applications that are currently 
working?  I'm guessing not, but I tend to be cautious about such issues.)

--
stage:  -> needs patch
versions: +Python 3.5, Python 3.6 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25728] email parser ignores inner multipart boundary when outer message duplicates it

2015-11-24 Thread R. David Murray

R. David Murray added the comment:

Who is to say that the outer message is defective and not the inner one?  How 
can a parser decide which part belongs to which message?  It isn't an AI.

The whole message is defective, so all bets are off :)  The library can't 
successfully parse such a message, though it goes to significant pains to make 
sure it never generates one.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25728] email parser ignores inner multipart boundary when outer message duplicates it

2015-11-24 Thread Forest

Forest added the comment:

RFC 2046 says that the outer message is defective, since it uses a boundary 
delimiter that is quite obviously present inside one of the encapsulated parts:

https://tools.ietf.org/html/rfc2046#section-5.1

"The boundary delimiter MUST NOT appear inside any of the encapsulated parts, 
on a line by itself or as the prefix of any line."

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25728] email parser ignores inner multipart boundary when outer message duplicates it

2015-11-24 Thread Forest

Forest added the comment:

> The library can't successfully parse such a message

It could successfully parse such a message, if it matched against inner message 
boundaries before outer message boundaries.  (One implementation would be to 
keep a list of all ancestor boundaries and traverse the list from most recent 
to least recent, but there might be more efficient ways to do it.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25728] email parser ignores inner multipart boundary when outer message duplicates it

2015-11-24 Thread Forest

New submission from Forest:

When a multipart message erroneously defines a boundary string that conflicts 
with an inner message's boundary string, the parser ignores the (correct) inner 
message's boundary, and treats all matching boundary lines as if they belong to 
the (defective) outer message.

This file from the test_email suite demonstrates the problem:
Python-3.5.0/Lib/test/test_email/data/msg_15.txt

Consequentially, the inner multipart/alternative message is parsed with 
is_multipart() returning False, and a truncated payload.

Moreover, unit tests like test_same_boundary_inner_outer() expect to find the 
StartBoundaryNotFoundDefect defect on the inner message in that file, which 
seems wrong to me, since the inner message is not defective.  According to the 
RFCs, the outer message should have been generated with a boundary string that 
does not appear anywhere in its encoded body (including the inner message).  
The outer message is therefore the defective one.

--
components: email
messages: 255309
nosy: barry, forest, r.david.murray
priority: normal
severity: normal
status: open
title: email parser ignores inner multipart boundary when outer message 
duplicates it
type: behavior
versions: Python 2.7, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25728] email parser ignores inner multipart boundary when outer message duplicates it

2015-11-24 Thread Forest

Forest added the comment:

I thought at first that this might be deliberate behavior in order to comply 
with RFC 2046 section 5.1.2.
https://tools.ietf.org/html/rfc2046#section-5.1.2

After carefully re-reading that section, I see that it is just making sure an 
outer message's boundary will still be recognized if an inner multipart message 
is missing its boundary markers (for example if the inner message was 
truncated).  It does not describe any circumstances under which the inner 
message's boundary markers should be ignored when they are present.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com