[issue42433] mailbox.mbox fails on non ASCII characters

2020-12-05 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

3.7 only gets security fixes.  If and when someone merges something, that 
person will decide whether to backport.

--
versions:  -Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-12-05 Thread Florian Klink


Florian Klink  added the comment:

Based on https://bugs.python.org/issue42433#msg382169 I added back the versions 
that bug is present.

The PR is up to and appropriately linked (I think?) - let me know if there's 
anything left to be done from my side.

--
versions: +Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-11-30 Thread R. David Murray


R. David Murray  added the comment:

After thinking about it some more, I think given that when there is no 
non-ascii mbox will happily treat *anything* as valid on the "From " line, that 
we should consider blowing up on non-ascii to be a bug.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-11-29 Thread Florian Klink


Florian Klink  added the comment:

I opened https://github.com/python/cpython/pull/23553 - PTAL.

I made this an enhancement for 3.10 - but it could probably also be backported 
to older versions

--
keywords: +patch
message_count: 4.0 -> 5.0
pull_requests: +22434
stage:  -> patch review
type:  -> enhancement
versions:  -Python 3.8, Python 3.9
pull_request: https://github.com/python/cpython/pull/23553

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-11-27 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

(The non-ascii chars are “ and ”, versus ascii ".)

Florian, although you did not select a 'Type', selecting multiple versions 
implicitly claims that the current behavior is a bug.  I believe R.David has 
explained that it is not, even if sub-optimal.  Do you want to
A. Argue on the basis of some claim in the docs that this really is a bug.
B. Close this issue as 'Not a bug'.
C. Turn it into an enhancement issue for 3.10 by calling decode in the 
appropriate place.  Is so, you might first try making the change in your code 
after finding the appropriate place and see if the improvement is worth the 
change.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-11-27 Thread Terry J. Reedy


Change by Terry J. Reedy :


--
versions: +Python 3.10 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-11-22 Thread Florian Klink


Florian Klink  added the comment:

Yeah, not questioning here this might be badly formatted, but given these files 
are out there, and the parser is somewhat forgiving in other cases, it should 
be tolerant there as well.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-11-22 Thread R. David Murray

R. David Murray  added the comment:

The problem with that archive is that it is not in proper mbox format.  It 
contains the following line (5689):

From here I was hoping to run something like “dbus-send –system 
–dest=Test.Me –print-reply /Japan Japan.Reset.Test string:”Hello””

You will note that there is no leading '>' on that line to escape that 'From '. 
 So mbox tries to build a 'From ' line from it, and fails because 'From ' lines 
should not contain any non-ascii characters.  It can be argued that that 
failure is sub-optimal...it should probably be calling decode('ascii', 
errors='replace') so that the parse doesn't fail, just like it would not fail 
if there were no non-ascii in the unescaped 'From ' line.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42433] mailbox.mbox fails on non ASCII characters

2020-11-22 Thread Florian Klink


New submission from Florian Klink :

I'm importing some mbox archives into my maildirs, and use `mailbox.mbox` to 
parse archives created by pipermail.

Some of these archives seem to contain non-ascii characters, and python just 
throws a `UnicodeDecodeError` and refuses to process the archive.

Reproducer: (successful on 3.7.9, 3.8.5, 3.9.0)

```
curl https://lists.freedesktop.org/archives/systemd-devel/2016-January.txt.gz | 
zcat > mbox.txt
python3 -c "import mailbox; mb = mailbox.mbox('mbox.txt');mb.items()"
```

--
components: email
messages: 381607
nosy: barry, flokli, r.david.murray
priority: normal
severity: normal
status: open
title: mailbox.mbox fails on non ASCII characters
versions: Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com