[issue34954] Getting an email's subject is error-prone

2018-10-11 Thread Alex Corcoles

Alex Corcoles  added the comment:

Duh, I'm an idiot, I only tested policy.HTTP and *NOT* supplying a policy 
(which I believed was equivalent to using policy.default).

policy.default and policy.SMTP do indeed produce a newline-less subject indeed.

I only tested policy.HTTP because the docs talk about unlimited line-length, 
but that's a problem of the docs, but rather, a problem of my idiocy.

Given this, I agree with everything you said. Personally I'd prefer if 
policy.default was the default, but I guess that won't change due to backwards 
compatibility reasons and I guess it'd be excessive to create a new set of 
function calls and deprecate the old, so I'm happy if this remains closed.

Apologies for my stupidity,

Álex

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-11 Thread R. David Murray


R. David Murray  added the comment:

I'm guessing you got confused by the fact that the HTTP policy doesn't *add* 
new lines when *serializing*.  If you can point to the part of the docs you 
read that produced that confusion, maybe we can improve it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-11 Thread R. David Murray


R. David Murray  added the comment:

Can you demonstrate that policy.default and policy.SMTP produce a subject with 
newlines?  If they do, that is a serious bug.

Please don't reopen the issue.  I'll reopen it if you convince me there is a 
bug :)

The statement you suggest we add is not appropriate[*], since the python3 email 
library *is* a high level library now.  If it isn't handling something for you 
when you use policy.default or policy.SMTP, then that is a bug.  (Well, it's 
MIME Multipart handling still leaves something to be desired...you still have 
to know more than is optimal about multiparts, but the hooks are there for 
someone to improve that aspect further.)

[*] The part about the protocol is certainly true, though :)

--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-11 Thread Alex Corcoles

Alex Corcoles  added the comment:

Well, I think that having to choose the "HTTP" policy to get a message 
subject's without newlines goes against the expectations of anyone who is not 
well knowledgeable of email.

It's not very easy to deduct that, out of all the available policies, HTTP is 
the one that has this effect (or writing your own).

It's not obvious that a subject can have newlines, as I don't think I've ever 
seen a MUA that does not hide them.

You can be bitten quite easily by that (we have, more than once).

It's the stdlib's maintainers' prerrogative to decide that they are going to 
provide low-level libraries (and in general, I agree with that, high-level 
stdlibs have a lot of problems), but at least I'd include some warning like:

"Email is an old and annoying protocol, and parsing email is full of annoyances 
and exceptions. email provides low-level building blocks to handle email in 
detail. If you want high-level processing we advise you to look at libraries 
that build on it".

In any case, email.policy provides more hints as to headers being wordwrapped, 
and while it's not ideal, it certainly is an improvement WRT to Python 2, so 
this bug has helped me and I hope maybe someone will read it when Googling for 
the same problem, so while I think some more could be done, if you close this I 
won't complain.

Thanks,

Álex

--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-10 Thread R. David Murray


R. David Murray  added the comment:

The new policies *make* the email library that higher level library, that was 
pretty much the whole point :)  I don't know how to make getting the fully 
decoded subject more intuitive than:

  msg['subject']

The fact that you have to specify a policy is due to backward compatibility 
concerns, and there's not really any way around that.  That's the only 
difference between your two examples (other than the fact that the second one 
does what you want :).

Note that you *really* want to be using message_from_bytes, and for email 
either policy.default or policy.SMTP.  This *is* documented in the python3 
docs.  If you don't find them clear, then an issue to improve the docs would be 
welcome.

Since python2 is approaching EOL, we could also start transitioning to 
policy.default actually being the *default*.  That will take two release cycles 
(one that will generate a deprecation notice that the default is going to 
change, and another that will actually make the change).

--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-10 Thread Alex Corcoles


Alex Corcoles  added the comment:

To clarify (and maybe help someone which might come across), you mean:

In [1]: message_text = """To: a...@corcoles.net
   ...: Subject: ** ACKNOWLEDGEMENT Host Alert: archerc7.bcn.int.pdp7.net is 
DOWN
   ...:  **
   ...: User-Agent: Heirloom mailx 12.5 7/5/10
   ...: MIME-Version: 1.0
   ...: Content-Type: text/plain; charset=us-ascii
   ...: Content-Transfer-Encoding: 7bit
   ...: 
   ...: * Nagios *
   ...: """
In [2]: import email
In [4]: message = email.message_from_string(message_text)
In [5]: message.get('Subject')
Out[5]: '** ACKNOWLEDGEMENT Host Alert: archerc7.bcn.int.pdp7.net is DOWN\n **'

In [7]: from email import policy
In [8]: message = email.message_from_string(message_text, policy=policy.HTTP)
In [9]: message.get('Subject')
Out[9]: '** ACKNOWLEDGEMENT Host Alert: archerc7.bcn.int.pdp7.net is DOWN **'

Yeah, there's a bundled policy that does what I need, but I think it's not very 
intuitive.

I get that the stdlib is deliberately low level in these parts, and it's more 
of building block to create higher level libraries on top of that, but still I 
feel that getting an email's subject in a friendly fashion should be easy and 
intuitive in the stdlib, or the stdlib's docs should point out clearly to go 
and look for a higher level library because email is hard.

OTOH, working with mail sucks and should be discouraged, so if you want to 
close this definitely I won't complain.

--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-10 Thread R. David Murray


R. David Murray  added the comment:

Use the new email policies in python3.  It handles all the decoding for you.  
I'm afraid you are on your own for python2.

--
resolution:  -> out of date
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-10 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34954] Getting an email's subject is error-prone

2018-10-10 Thread Alex Corcoles

New submission from Alex Corcoles :

Hi,

This is something that has hit us a few times, as we write a significant 
quantity of software which parses email messages.

The thing is, we use email.header.decode_header to decode the Subject: header 
and it is pretty common for headers to be word-wrapped. If they are, 
decode_header will return a string with newlines in it.

This is something which is unexpected for many people, and can cause bugs which 
are very difficult to detect in code review or testing, as it's easy to not 
trigger wordwrapping if not done deliberately.

We would humbly suggest to provide a friendly way to get an email's subject in 
the expected fashion (i.e. with no newlines) or point out this caveat in the 
docs (or maybe change decode_header to remove newlines itself).

Kind regards,

Álex

--
components: email
messages: 327481
nosy: Alex Corcoles, barry, r.david.murray
priority: normal
severity: normal
status: open
title: Getting an email's subject is error-prone
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com