[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent

2019-12-21 Thread Abhilash Raj


Change by Abhilash Raj :


--
nosy: +maxking

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent

2019-12-17 Thread R. David Murray


R. David Murray  added the comment:

All of which isn't to discount that you might have a found a bug, by the way, 
if you want to investigate further :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent

2019-12-17 Thread R. David Murray


R. David Murray  added the comment:

The problem is that you are starting with different inputs.  unicode strings 
and bytes are different things, and so parsing them can produce different 
results.  The fact of that matter is that email messages are defined to be 
bytes, so parsing a unicode string pretending it is an email message is just 
asking for errors anyway.  The string parsing methods are really only provided 
for backward compatibility and historical reasons.

I thought this was clear from the existing documentation, but clearly it isn't 
:)  I'll review a suggested doc change, but the thing to explain is not that 
parse and parsebytes might produce different results, but that parsing email 
from strings is not a good idea and will likely produce unexpected results for 
anything except the simplest non-mime messages.

Note: the reason you got different checksums might have had to do with line 
ends, depending on how you calculated the checksums.  You should also consider 
using get_content and not get_payload.  get_payload has a weird legacy API that 
doesn't always do what you think it will, and that might be another source of 
checksum issues.  But really, parsing a unicode representation of a mime 
message is just likely to be buggy.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent

2019-12-17 Thread Manfred Kaiser


Manfred Kaiser  added the comment:

I think, the best way is to fix the documentation. The reason is, when a 
developer rely to the behavior of the function but the behavior is changed, a 
program may work incorrect.

Just think about forensic stuff. If a hash value will be created with the 
"parsebytes" method and the behavior will be changed to match the behavior of 
the "parse" method, the the evidence can not be validated with the latest 
python versions.

We could add a comment to the documentation. For example "parsebytes parses the 
mail in a different way than parse, which may produce slightly different 
messages. If you rely on the same behavior for file and byte like objects you 
can use the parse method with BytesIO"

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent

2019-12-17 Thread Manfred Kaiser


New submission from Manfred Kaiser :

I used email.parser.BytesParser for parsing mails. 

In one programm I used parse, because the email was stored in a file.
In a second programm the email was stored in memory as a bytes object.

I created hash values from each parts an compared them, to check if a part is 
already known to my programs. This works for attachments, but not for html and 
plain text parts.

Documentation for parsebytes:

Similar to the parse() method, except it takes a bytes-like object instead of a 
file-like object. Calling this method on a bytes-like object is equivalent to 
wrapping bytes in a BytesIO instance first and calling parse().

When I read the documentation, I expected that both methods will produce the 
same output.

The testmail contains 2 mimeparts. One with html and one with plain text.

The parse method with a file and the parse method with bytes-data, wrapped in a 
BytesIO produces the same hashes. The paesebytes method creates different 
hashes.

Output of my testprogram:

MD5 sums with parsebytes with bytes data
3f4ee7303378b62f723a8d958797507a
45c72465b931d32c7e700d2dd96f8383

MD5 sums with parse and BytesIO with bytes data
fb0599d92750b72c25923139670e5127
9a54b64425b9003a9e6bf199ab6ba603

MD5 sums with parse from file
fb0599d92750b72c25923139670e5127
9a54b64425b9003a9e6bf199ab6ba603



Is this an expected behavior or is this an error?

--
components: email
files: test.eml
messages: 358533
nosy: barry, mkaiser, r.david.murray
priority: normal
severity: normal
status: open
title: email.parser.BytesParser - parse and parsebytes work not equivalent
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8
Added file: https://bugs.python.org/file48785/test.eml

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent

2019-12-17 Thread Manfred Kaiser


Change by Manfred Kaiser :


Added file: https://bugs.python.org/file48786/test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com