[issue32298] Email.quopriprime over-encodes characters

2017-12-13 Thread Geoff Kuenning

Geoff Kuenning  added the comment:

I should have read that part of RFC 2047 before I submitted.

I'd love to claim that I'm going to write a patch that would do as you suggest. 
 But the reality is that I'm unlikely to find the time, so I'm going to be wise 
for once and avoid promising what I can't deliver.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32298] Email.quopriprime over-encodes characters

2017-12-12 Thread R. David Murray

R. David Murray  added the comment:

And of course tools can grep for "f...@bar.com": you can't use encoded words in 
an address, only in the display name.

However, it occurs to me that in fact the restriction applies only to phrases, 
so one could use a less restrictive character set in an unstructured header 
such as the Subject, and that would indeed be nice.  The old header folder 
(python 2.7 and python 3.x compat32 policy) can't do it, because they don't 
know anything about the syntax of the headers they fold, they just use a bunch 
of heuristics.  The new policies in python3, however, use a smarter folder from 
_header_value_parser, and that *does* have access to the full parse tree for 
the header, and so could make smart decisions about which character set to use 
for the encoded word encoding.

If you'd like to try your hand at a PR implementing this idea, I'll be happy to 
provide advice and do a review.  It's not going to be anywhere near as simple 
as the one line change you proposed here, though :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32298] Email.quopriprime over-encodes characters

2017-12-12 Thread R. David Murray

R. David Murray  added the comment:

>From RFC 2047:

(3) As a replacement for a 'word' entity within a 'phrase', for example,
one that precedes an address in a From, To, or Cc header.  The ABNF
definition for 'phrase' from RFC 822 thus becomes:

phrase = 1*( encoded-word / word )

In this case the set of characters that may be used in a "Q"-encoded
'encoded-word' is restricted to: .  An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.

The reason for this is that things like '@' are syntactically significant in 
headers and so must be encoded.

--
nosy: +r.david.murray
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32298] Email.quopriprime over-encodes characters

2017-12-12 Thread Geoff Kuenning

Geoff Kuenning  added the comment:

Oops, that loop is a bit too generous.  Here's a better one:

for c in list(range(33, 61)) + [62] + list(range(64, 95)) + list(range(96,127)):

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32298] Email.quopriprime over-encodes characters

2017-12-12 Thread Geoff Kuenning

New submission from Geoff Kuenning :

Email.quopriprime creates a map of header and body bytes that need no encoding:

for c in b'-!*+/' + ascii_letters.encode('ascii') + digits.encode('ascii'):
_QUOPRI_HEADER_MAP[c] = chr(c)

This map is overly restrictive; in fact only two printable characters need to 
be omitted: the space and the equals sign.  The following revision to the loop 
creates a correct table:

for c in list(range(33, 61)) + list(range(62, 127)):
_QUOPRI_HEADER_MAP[c] = chr(c)

Why does this matter?  Well, first, it's wasteful since it creates messages 
with larger headers than necessary.  But more important, it makes it impossible 
for other tools to operate on the messages unless they're encoding aware; for 
example, one can't easily grep for "f...@bar.com" because the at sign is 
encoded as =40.

--
components: Library (Lib)
messages: 308181
nosy: gkuenning
priority: normal
severity: normal
status: open
title: Email.quopriprime over-encodes characters
type: behavior
versions: Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com