[issue17505] [doc] email.header.Header.__unicode__ does not decode header
R. David Murray added the comment: The policy is named 'default' because it was intended to become the default two feature releases after the new email code became non-provisional (first: deprecate not specifying an explicit policy, next release make default the default policy and make the deprecation only cover compat32). However, for various reasons that switchover did not happen (one big factor being my reduced time spent doing python development). It can happen any time someone steps forward to guide it through the release process. -- ___ Python tracker <https://bugs.python.org/issue17505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43333] utf8 in BytesGenerator
R. David Murray added the comment: Yeah, I think we need a complete example here. Note that in the general case there is no such thing as an RFC-valid email in unicode (which is what python strings are), though with utf8=True and an email involving only text you might get away with it. I assume you've tried policy=policy.default.clone(utf=True) when creating the email? It will probably help to encode the 'text' to utf8 and use message_from_bytes to read it, but that may not be your only problem. It depends on exactly what is in the message and how the message gets recorded in your XML whether this is even going to work in the general case. The xml conversion may have already lost information, but hopefully not. -- ___ Python tracker <https://bugs.python.org/issue4> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46392] MessageIDHeader is too strict for message-id
R. David Murray added the comment: The general idea is that the string version of the header should contain all of the original information, but the parsed elements (the things returned by special header attributes) will contain the valid data, if any. So if the string version of the header is being truncated or transformed (other than whitespace changes during re-folding), that is a bug. Your examples involve comment fields, and I'm afraid that my development of the parser stopped before I did very much with comments. Therefore I am not surprised that comments are handled incorrectly :( :( They aren't very common in the wild, as far as I was able to tell. which is why they were my last priority. -- ___ Python tracker <https://bugs.python.org/issue46392> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46392] MessageIDHeader is too strict for message-id
R. David Murray added the comment: Note that the parser does attempt to accept obsolete syntax (registering defects for it), so if there is a bug in the implementation of the obsolete syntax handling it should be fixed. And yes, there have been other bugs with whitespace handling in the parser, unfortunately. Examples would be most helpful, even if you don't write unit tests. Most of the tests, by the way, are in test__header_value_parser (search for message_id). There aren't very many, so more would be good. -- ___ Python tracker <https://bugs.python.org/issue46392> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12756] datetime.datetime.utcnow should return a UTC timestamp
R. David Murray added the comment: Note also that datetime.now() gives you a naive datetime. From an API consistency standpoint I think it makes sense that datetime.utcnow() gives a naive datetime. It would actually be confusing (IMO) for it to return an aware datetime. I can see why you might disagree, but backward compatibility wins in this case regardless. -- ___ Python tracker <https://bugs.python.org/issue12756> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46011] Python 3.10 email returns invalid Date: header unchanged.
R. David Murray added the comment: Yeah, I think there may be a general issue with getting header defects reflected somehow in message.defects, but that's a separate issue :) -- ___ Python tracker <https://bugs.python.org/issue46011> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44637] Quoting issue on header Reply-To and other address headers
Change by R. David Murray : -- nosy: +thehesiod title: Quoting issue on header Reply-To -> Quoting issue on header Reply-To and other address headers ___ Python tracker <https://bugs.python.org/issue44637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45932] EmailMessage incorrectly splits name and address header
R. David Murray added the comment: This is a duplicate of #44637. -- resolution: -> duplicate stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue45932> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45299] SMTP.send_message() does from mangling when it should not
R. David Murray added the comment: Your backward compatibility argument is persuasive. As you say, that means the BytesGenerate docs would need to be updated to note that that parameter is the exception to the rule for backward compatibility reasons. (If it is the only exception I have to wonder if I had a backward compatibility reason for doing it that way in the first place and just forgot to document it. It is too long ago to remember. It is even possible that effectively changing the default broke mbox and that's why it is an exception :) As for the send_message change, if mangle_from_ is the only exception then I think just passing it does make sense, maybe with a comment referencing the BytesGenerator docs for mangle_from_ to explain why it is needed. -- ___ Python tracker <https://bugs.python.org/issue45299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45299] SMTP.send_message() does from mangling when it should not
R. David Murray added the comment: In this case the docs are correct and the code has a bug. The intent was that if the message passed in to BytesGenerator has a policy, that policy should be followed. If it is not being followed, that's a bug in BytesGenerator. The tricky part of course is backward compatibility. Is there code out there depending on this bug? Anyone want to hazard a guess? Are there things other than mangle_from_ that are being ignored? If we decide it is too risky to fix in BytesGenerator (or maybe only to fix it in a feature release), then I'd pass the whole policy in the else clause, with a comment about what bug it is working around. -- ___ Python tracker <https://bugs.python.org/issue45299> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45551] EmailMessage utf-8 folding error
R. David Murray added the comment: I'm pretty sure this is a duplicate report and that we worked on a fix, but I don't know if it got committed because I can't find the issue... (To be clear, the problem here is the lack of whitespace at the start of the folded part of the header.) -- ___ Python tracker <https://bugs.python.org/issue45551> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28973] [doc] The fact that multiprocess.Queue uses serialization should be documented.
R. David Murray added the comment: Mentioning ids would be pretty much redundant with mentioning pickle. If it is pickled its id is going to change. I think Davin was suggesting that while the use of serialization is documented, it is not documented *consistently*. Everywhere serialization happens it should be mentioned in the docs. Regardless, a proposed doc PR is the way forward here. -- ___ Python tracker <https://bugs.python.org/issue28973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44685] Email package issue with Outlook msg files
R. David Murray added the comment: That file appears to be a binary file? By itself it isn't enough to reproduce the problem. Can you provide a complete script as well as the email message you are parsing that demonstrates the problem? By "looks like any other eml file", are you including the MIME headers associated with the part? Because it is the MIME headers that contain the information you say is missing. Mostly likely, outlook is not supplying that information for these transformed eml files. If you can supply a copy of the actual email message you are parsing, we should be able to confirm that. -- ___ Python tracker <https://bugs.python.org/issue44685> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44694] Message from BytesParser cannot be flattened immediately
R. David Murray added the comment: I suspect maxheaderlen=0 works because it causes the original lines to be re-emitted without any folding or other processing. Without that, lines longer than the default max_line_length get refolded. Can you provide an example of an input message that triggers this problem? -- ___ Python tracker <https://bugs.python.org/issue44694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44660] email.feedparser: support RFC 6532 section 3.5
R. David Murray added the comment: Having looked at the cited part of the RFC (but not tried to analyze it in detail), I think you are correct. I've also glanced at your PR, and I think your approach is correct in broad outline, but I haven't looked at the details. For full message/global support, however, it will also be necessary to look at the output side: given a message/global part, a transfer encoding should be applied when serializing with cte_type=7bit. Support for message/global should also be added to the contentmanager. I won't have an objection if this is accepted with only the feedparser support, but I would recommend that the remaining pieces of support for message/global be added before the feature is released. -- ___ Python tracker <https://bugs.python.org/issue44660> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43124] [security] smtplib multiple CRLF injection
R. David Murray added the comment: My apologies, I did not think about the possibility of an English issue. I was reacting to the "security report speak", which I find often makes a security issue sound worse than it is :) Thank you for reporting this problem, and I do think we should fix it. My posting was directed at the severity of the issue, since it was potentially holding up a release. My point about the example is that without an example of code that could reasonably be expected to use user input in a call that could inject newlines, we can treat this as a low priority issue. If we had a proposed example of such code, then the priority would be higher. If it was an example of such code "in the wild", then it would be quite high :) The reason I'm saying we should have an example in order to consider it higher priority is that I cannot see *any* likelihood that this would be a problem in practice. Let me explain. putcmd is an *internal* interface. If we look at the commands that call putcmd or docmd, the only ones that pass extra data that aren't pretty obviously safe (ie: not clearly sanitized data) are rcpt and mail[*]. In both cases the item of concern is optionslist. optionslist is a list of *SMTP server options*. This is not data that is reasonably taken from user input, it is data provided *by the programmer*. [*] I did double check to make sure that email.utils.parseaddr sanitizes both \r and \r, just to be sure :) Therefore this is *not* a significant security issue. But as I said, we should take the "defense in depth" approach and apply the check in putcmd as you recommend. I just don't think it needs to hold up a release. -- ___ Python tracker <https://bugs.python.org/issue43124> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44637] Quoting issue on header Reply-To
R. David Murray added the comment: Yes, compat32 uses a different parser and folder (the legacy ones), that have a lot of small bugs relative to the RFCs (which is why I rewrote it). -- ___ Python tracker <https://bugs.python.org/issue44637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44637] Quoting issue on header Reply-To
R. David Murray added the comment: Forget what I said about my different error, I made a mistake running the test script. Interesting. If it is related to the length of the name, then the problem is most likely in the folding algorithm, specifically in what happens when the "display-name" token is wrapped across lines. And indeed, if we clone the SMTP policy and set the max_line_len to 1000 in your sample script. it renders the header correctly. The problem here is that the surrounding quotation marks are added by the 'value' property of DisplayName, but that property isn't invoked when handling parts of the display name separately during mulit-line folding. I was always bothered by the handling of the quotation marks in the part of the parser and folder dealing with quoted strings, but I never hit on a better way to do it. This, unfortunately, is going to be non-trivial problem to solve. It is probably going to require an ugly hack in the folding code :( Really, the handling of quoted strings throughout the _header_value_parser code is...a hack :( There are probably other places where it breaks down during multi-line folding. If we are lucky the hack can just add special handling for the quoted-string token type in the folder. If we aren't it will get messier :( Glancing at the folder code (it's been a long time since I worked on it), one possible approach (not necessarily the best one) would be to mark the first and last sub-tokens in a quoted-string so that folder knows to put in a leading or trailing quote mark, respectively, during folding. -- ___ Python tracker <https://bugs.python.org/issue44637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44637] Quoting issue on header Reply-To
R. David Murray added the comment: There is definitely a problem here, though I see a different problem when I run it (AttributeError: 'Group' object has no attribute 'local_part', presumably because of the ':' not getting escaped correctly). I believe it applies to any address header, not just Reply-To. Unfortunately I don't have time to investigate the cause, at least right now. An interesting first step on diagnosing it might be to produce a minimal example: start deleting special characters from inside that quoted string until you find the one (or ones) that is triggering it. -- ___ Python tracker <https://bugs.python.org/issue44637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43124] [security] smtplib multiple CRLF injection
R. David Murray added the comment: s/header injection/command injection/ -- ___ Python tracker <https://bugs.python.org/issue43124> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43124] [security] smtplib multiple CRLF injection
R. David Murray added the comment: This bug report starts with "a malicious user with direct access to `smtplib.SMTP(..., local_hostname, ..)", which is a senseless supposition. Anyone with "access to" the SMTP object could just as well be talking directly to the SMTP server and do anything they want that SMTP itself allows. The concern here is that data a program might obtain *from unsanitized user input* could be used to do header injection. The "proof of concept" does not address this at all. We'd need to see a scenario under which data that could reasonably be derived from user input ends up being passed as arguments to an smtplib method that calls putcmd with arguments. So, I would rate this as *very* low impact issue, unless someone has an *actual example* of code using smtplib that passes user input through to smtplib commands in an exploitable way. That said, it is perfectly reasonable to be proactive here and prevent scenarios we haven't yet thought of, by doing as recommended (and a bit more) by raising a ValueError if 'args' in the putcmd call contain either \n or \r characters. I don't think we need to check 'cmd', because I can't see any scenario in which the SMTP command would be derived from user input. If you want to be *really* paranoid you could check cmd too, and since it will always be a short string the additional performance impact will be minor. -- type: performance -> security versions: +Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue43124> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43493] EmailMessage mis-folding headers of a certain length
R. David Murray added the comment: Ah, yes, the problem is more subtle than I thought. The design here is that we should be starting with the largest lexical unit, seeing if that fits on the current line, or a line by itself, and if so, using that, and if not, move down to the next smaller lexical unit and try again, until we are finally left with an unbreakable unit. For unstructured headers such as Subject the lexical units should be encoded words followed by blank delimited words. I'm guessing the code is treating the collection of words it has accumulated as a unit in the above algorithm, and since it fits on a line by itself, it goes with that. So yeah, it's sort of intentional. So the bug here is that in your step 2 we ideally want to be considering whether the last token on the current line is at the same lexical level as the token that precedes it...and if so, and if moving that token to the next line lets the remainder fit on the first line, we should do that. Exactly how to implement that correctly is a good question...it's been too long since I wrote that code, and I may not have time to investigate it more deeply. If you come up with something based on my description of the intent above, I should be able to review it (though you might need to ping me directly to get my attention). -- ___ Python tracker <https://bugs.python.org/issue43493> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39100] email.policy.SMTP throws AttributeError on invalid header
R. David Murray added the comment: How are you encountering this error? The following program runs without exception for me on master: from email import message_from_binary_file from email.policy import SMTP msg = message_from_binary_file(open('mail.eml', 'rb'), policy=SMTP) print(msg) -- ___ Python tracker <https://bugs.python.org/issue39100> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44560] Unrecognized charset "eucgb2312_cn" in email header for many MUA
R. David Murray added the comment: I can't tell tell for sure if this behavior is intentional or not from a quick glance at the code (though like you I wouldn't think it would be). That's part of the legacy api, at this point. The new api will just use utf8: from email.message import EmailMessage m = EmailMessage() m['Subject'] = '中文' print(bytes(m)) results in b'Subject: =?utf-8?b?5Lit5paH?=\n\n' The fix, assuming it is correct, would be to add the line: 'eucgb2312_cn': 'gb2312', to the CODEC_MAP in email/charset.py, and then specify the internal codec name in your Charset call. I'm not sure that's right, though...once upon I time I think I understood the logic behind the charset module, but I no longer remember the details. I'd recommend just using the new API and not the legacy API. -- ___ Python tracker <https://bugs.python.org/issue44560> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42892] AttributeError in email.message.get_body()
R. David Murray added the comment: Actually, I'm wrong. The body of a part can be a string, and that's what's going to happen with a malformed body of something claiming to be a multipart. The problem is that there is code that doesn't guard against this possibility. The following patch against master fixes the bug listed here, as well as iter_parts(). But it causes one test suite failure so it isn't a correct patch as it stands: diff --git a/Lib/email/message.py b/Lib/email/message.py index 3701b30553..d5d4a2385a 100644 --- a/Lib/email/message.py +++ b/Lib/email/message.py @@ -982,7 +982,7 @@ def _find_body(self, part, preferencelist): if subtype in preferencelist: yield (preferencelist.index(subtype), part) return -if maintype != 'multipart': +if maintype != 'multipart' or not self.is_multipart(): return if subtype != 'related': for subpart in part.iter_parts(): @@ -1087,7 +1087,7 @@ def iter_parts(self): Return an empty iterator for a non-multipart. """ -if self.get_content_maintype() == 'multipart': +if self.is_multipart(): yield from self.get_payload() def get_content(self, *args, content_manager=None, **kw): Maybe someone can take this and finish it (with tests)...I may or may not have time to get back to this. -- ___ Python tracker <https://bugs.python.org/issue42892> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42892] AttributeError in email.message.get_body()
R. David Murray added the comment: Yes, that's the real question. That's what needs to be fixed, otherwise we'll just keep finding new bugs. For example, try calling iter_parts() on that message. It isn't pretty :) -- ___ Python tracker <https://bugs.python.org/issue42892> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43922] Double dots in quopri transported emails
R. David Murray added the comment: As far as I know the only resources are the context manager docs and the source code. The stdlib content manager can serve as a model. I have to admit that it was long enough ago that I wrote that code that I'd have to re-read the docs and code myself to figure it out :) I'm afraid I don't really have time to do a complete review, but at a quick glance your patch doesn't look too complicated to me. Quick observation: the comment should explain why the dot check is done, and that it isn't needed for rfc compliance. -- ___ Python tracker <https://bugs.python.org/issue43922> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43922] Double dots in quopri transported emails
R. David Murray added the comment: Since python is doing the right thing here, I don't see a particularly good reason to put a hack into the stdlib to fix the failure of third party software to adhere to standards. (On the output side. We do follow Postel's rule on input and try hard to handle broken but recoverable input.) I don't actually *object* to it, though, as long as it follows the standard on output, and is a *simple* change. Please note that you can fix this locally by implementing and using a custom content manager. -- ___ Python tracker <https://bugs.python.org/issue43922> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43493] EmailMessage mis-folding headers of a certain length
R. David Murray added the comment: Parsing and newlines have nothing to do with this bug, actually. I don't think your foldfix post-processing is going to do what you want in the general case. The source of the bug here is in the folding algorithm in _header_value_parser. It has checks to see if the "text so far" will fit within the header width, and it starts a new line under vafious conditions. For example, if there is a single word after Subject: whose length is, say, 70, it would produce the effect you show, because the single word would fit without folding or encoding on a new line. I don't think this violates the RFC. What your example shows makes it look like the folder is treating all of the text as if it were a single word, which is obviously wrong. It is supposed to break at spaces. You will note that if you increase the repeat count in your example to 16 it folds the line correctly. So the bug has something to do with the total text so far accumulated for the line being right in that window where it won't fit on the first line but does fit on a line by itself. This is obviously a bug in the folder, since it should be splitting that text if it isn't a sin gle word, not moving it to a new line as a whole. Note that this bug is still present on master. -- ___ Python tracker <https://bugs.python.org/issue43493> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43090] parseaddr (from email.utils) returns invalid input string instead of ('', '')
R. David Murray added the comment: The return value is correct. Interpreted as an email address, 'randomstring' is a local mailbox. -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue43090> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43061] subprocess: feature request: Get only the stdout of the last shell command
R. David Murray added the comment: This has nothing to do with python other than the fact that you are using it to capture stdout. You have to figure out how to get the output you want to be what shows up on stdout, python has no knowledge of what commands you put in your shell script, and it *cannot* have any knowledge of that. I think you need to learn more about basic shell scripting and unix pipelines and how stdout works. Also note that making people nosy on an issue is not a good idea if you are not part of the triage team. You should leave that for the bug triage people to do, as they know who's attention on the issue will be most useful. In the future when you open an issue please simply wait a while for a response. -- resolution: -> rejected stage: -> resolved status: open -> closed versions: -Python 3.10 ___ Python tracker <https://bugs.python.org/issue43061> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42433] mailbox.mbox fails on non ASCII characters
R. David Murray added the comment: After thinking about it some more, I think given that when there is no non-ascii mbox will happily treat *anything* as valid on the "From " line, that we should consider blowing up on non-ascii to be a bug. -- ___ Python tracker <https://bugs.python.org/issue42433> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42484] get_obs_local_part fails to handle empty local part
R. David Murray added the comment: Yep, you've found another in a category of bugs that have shown up in the parser: places where there is a missing check for there being any value at all before checking character [0]. In this case, the fix should be to add if not obs_local_part: return obs_local_part, value just before the if that is blowing up. -- title: parse_message_id, get_msg_id, get_obs_local_part is poorly written -> get_obs_local_part fails to handle empty local part ___ Python tracker <https://bugs.python.org/issue42484> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42433] mailbox.mbox fails on non ASCII characters
R. David Murray added the comment: The problem with that archive is that it is not in proper mbox format. It contains the following line (5689): From here I was hoping to run something like “dbus-send –system –dest=Test.Me –print-reply /Japan Japan.Reset.Test string:”Hello”” You will note that there is no leading '>' on that line to escape that 'From '. So mbox tries to build a 'From ' line from it, and fails because 'From ' lines should not contain any non-ascii characters. It can be argued that that failure is sub-optimal...it should probably be calling decode('ascii', errors='replace') so that the parse doesn't fail, just like it would not fail if there were no non-ascii in the unescaped 'From ' line. -- ___ Python tracker <https://bugs.python.org/issue42433> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41553] encoded-word abused for header line folding causes RFC 2047 violation
R. David Murray added the comment: Yes for the registry changes. I thought we had fixed the bug that was causing message-id to get encoded, but maybe it still exists in 3.7? I don't remember when we fixed it (and I may be remembering wrong!) As for X- "unstructured headers" getting trashed, by *definition* in the rfc, if the header body is unstructured it must support RFC encoding. If does not, it is not an unstructured header field. Which is why I said we need to think about what characteristics the default parser should have. The RFC doesn't really speak to that, it expects every header to be one of the defined types...but while an X- header might be of a defined type, the email package can't know that unless it is told, so what should we use as the default parsing strategy? "text without encoded words" isn't really RFC compliant, I think. (Though I'll admit it has been a while since I last reviewed the relevant RFCs.) Note that I believe that we have an open issue (or at least an open discussion) that we should change the 'refold_source' default from 'long' to 'none', which means that X- headers would at least be passed through by default. It would also mitigate this problem, and can be used as a local workaround for headers that are just getting passed through and not modified. -- ___ Python tracker <https://bugs.python.org/issue41553> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41553] encoded-word abused for header line folding causes RFC 2047 violation
R. David Murray added the comment: It's not really an abuse. It is, however, buggy. It should be being applied *only* when the header contains unstructured text. Unfortunately I made the choice to treat any header that doesn't have a specific parser as unstructured, and that was a wrong choice which should be fixed. It is an interesting question what should be used as the default parser, though. Suggestions and code are welcome :) There should be specific header parsers for headers that contain message ids. That was on my todo list but did not get done before my circumstances changed and my free-time focus moved away from python development work :( The message_id parser exists. In-Reply-To just needs to be declared in the header registry as a MessageIDHeader (not sure how that got missed). Writing a Header class for References should be trivial, it's just a list of message ids. That will fix those headers, and I suggest we do that asap. Fixing the default-to-unstructured will take a bit more thought and should probably be split out into a separate issue. I can review and give advice (though you may have to ping me directly) but I won't have time to write any code. -- ___ Python tracker <https://bugs.python.org/issue41553> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41402] email: ContentManager.set_content calls nonexistent method encode() on bytes
R. David Murray added the comment: The fix looks good to me. Don't know how I made that mistake, and obviously I didn't write a test for it... -- ___ Python tracker <https://bugs.python.org/issue41402> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41387] Escape needed in the email documentation example
Change by R. David Murray : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue41387> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41145] EmailMessage.as_string is altering the message state and actually fix bugs
R. David Murray added the comment: The as_strings docs say: "Flattening the message may trigger changes to the Message if defaults need to be filled in to complete the transformation to a string (for example, MIME boundaries may be generated or modified)." So, while this is indeed an API design bug, it isn't an actual bug in the code but rather is expected behavior, currently. The historical reason for this is that the generator code looks at the entire message to make sure the boundary string is unique. My long term plan for email included plans to rewrite the generator, and I was going to fix this issue at that point. My life got too busy to be able to continue with email development work, though, so that never happened. It has been *years* since I've looked at the code. Thinking about it now, I'm wondering if it would be possible to use a GUID technique to generate the boundary and thus do exactly as you say: have make_alternative (and anything else that causes a boundary to be needed) pre-create the boundary. That, I think, would mean we wouldn't need to change the generator, even though it would still be doing its (inefficient) check that the boundary was unique. I'm not sure if it would work, though; it's been too long since I've looked at the relevant code. -- type: resource usage -> behavior ___ Python tracker <https://bugs.python.org/issue41145> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41206] behaviour change with EmailMessage.set_content
R. David Murray added the comment: I'm short of time, if someone could approve Mark's PR and merge it it would be great. There wasn't supposed to be any behavior change other than the one documented in #40597. -- ___ Python tracker <https://bugs.python.org/issue41206> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41023] smtplib does not handle Unicode characters
R. David Murray added the comment: If you use the 'sendmail' function for sending, then it is entirely your responsibility to turn the email into "wire format". Unicode is not wire format, but if you give sendmail a string that only has ascii in it it nicely converts it to binary for you. But given that the email RFCs specify specific ways to indicate how non-ascii is encoded in the message, there is no way for the smtp library to know now to do that correctly when passed an arbitrary unicode string, so it doesn't try. sendmail requires *you* do do the encoding to binary, indicating you at least think that you got the RFC parts right :) In python2, strings are binary by default, so in that case you are handing sendmail binary format data (with the same assumption that you got the RFC parts right)...if you passed the python2 function a unicode string it would probably complain as well, although not in the same way. If your raw email is RFC compliant, then you can do: sendmail(from, to, mymsg.encode()). I see from your example that you are trying to use the email package to construct the email, which is good. But, emails are *binary*, they are not unicode, so passing "message_from_string" a unicode string containing non-ascii isn't going to do what you are expecting, any more than passing unicode to the 'sendmail' function did. message_from_string is really only useful for doing certain sorts of debug and ought to be deprecated. Or produce a warning when handed a string containing non-ascii. (There are historical reasons why it doesn't :( And then you should use smtplib's 'sendmessage' function, which understands email package messages and will Do the Right Thing with them (including the extraction of the to and from addresses your code is currently doing). However, even if you encoded your raw message to binary and then passed it to message_from_bytes, your example message is *not* RFC compliant: without MIME headers, an email with non-ascii characters in the body is technically in violation of the RFC. Most email programs will handle that particular message despite that, but not all. You are better off using the email package to construct a properly RFC formatted email, using the new API (ex: msg = EmailMessage() (not Message), and then doing msg['from'] = address, etc, and msg.set_content(your unicode string body)). I can't really give you much advice here (nor should I, this being a bug tracker :) because I don't know how exactly how the data is coming in to your program in your real use case. Once you have a properly constructed EmailMessage object, you should use smtplib's 'sendmessage' function, which understands email package messages and will Do the Right Thing with them (including the extraction of the to and from addresses your code is currently doing, as well as properly handling BCC, which means deleting BCC headers from the message before sending it, which your code does not do and which 'sendmail' would not do.) SMTPUTF8 is about non-ascii in the email *headers*, and most SMTP servers these days do not yes support it[*]. Some of the big ones do, though (I believe gmail does). [*] although that doesn't explain why what you got was SMTPSenderRefused. You should have gotten SMTPNotSupportedError. -- resolution: -> works for me stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue41023> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: New changeset 21017ed904f734be9f195ae1274eb81426a9e776 by Abhilash Raj in branch 'master': bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. (gh-17620) https://github.com/python/cpython/commit/21017ed904f734be9f195ae1274eb81426a9e776 -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40597] generated email message exceeds RFC-mandated limit of 998 characters
Change by R. David Murray : -- stage: backport needed -> resolved ___ Python tracker <https://bugs.python.org/issue40597> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40597] generated email message exceeds RFC-mandated limit of 998 characters
R. David Murray added the comment: New changeset c1f1ddf30a595c2bfa3c06e54fb03fa212cd28b5 by Miss Islington (bot) in branch '3.8': bpo-40597: email: Use CTE if lines are longer than max_line_length consistently (gh-20038) (gh-20084) https://github.com/python/cpython/commit/c1f1ddf30a595c2bfa3c06e54fb03fa212cd28b5 -- ___ Python tracker <https://bugs.python.org/issue40597> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40597] generated email message exceeds RFC-mandated limit of 998 characters
R. David Murray added the comment: Thanks, Arkadiusz. -- resolution: -> fixed stage: patch review -> backport needed versions: -Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <https://bugs.python.org/issue40597> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40597] generated email message exceeds RFC-mandated limit of 998 characters
R. David Murray added the comment: New changeset 6f2f475d5a2cd7675dce844f3af436ba919ef92b by Arkadiusz Hiler in branch 'master': bpo-40597: email: Use CTE if lines are longer than max_line_length consistently (gh-20038) https://github.com/python/cpython/commit/6f2f475d5a2cd7675dce844f3af436ba919ef92b -- ___ Python tracker <https://bugs.python.org/issue40597> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40597] generated email message exceeds RFC-mandated limit of 998 characters
R. David Murray added the comment: The PR looks good to me, but I describe the change differently. I'm not sure how I missed this in the original implementation, since I obviously checked it for the 8bit case. Too long ago to remember :) -- ___ Python tracker <https://bugs.python.org/issue40597> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40359] email.parse part.get_filename() fails to unwrap long attachment file names (legacy API)
R. David Murray added the comment: As far as I know you currently still have to specify the policy. It was, yes, intended that 'default' become the actual default. I could have sworn there was an open issue for doing this, but I can't find it. I remember having a conversation with someone who said they were going to work on getting it done, but unfortunately I don't remember who :( I'm not very active in the python community currently so I can't really drive it, but it should definitely happen. -- ___ Python tracker <https://bugs.python.org/issue40359> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40359] email.parse part.get_filename() fails to unwrap long attachment file names (legacy API)
R. David Murray added the comment: Yeah, that looks like a bug in the old API. If you try the new API, it does the right thing. To do that, import email.policy and make your message_as_string call: email.message_from_string(raw, policy=email.policy.default) Note, however, that you really ought to be using message_from_bytes. Serialized email messages are bytes, not unicode, and using message_from_string will get you in to other trouble. I don't know if it is worth fixing the old API. -- title: email.parse part.get_filename() fails to unwrap long attachment file names -> email.parse part.get_filename() fails to unwrap long attachment file names (legacy API) ___ Python tracker <https://bugs.python.org/issue40359> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39073] email incorrect handling of crlf in Address objects.
Change by R. David Murray : -- stage: patch review -> backport needed ___ Python tracker <https://bugs.python.org/issue39073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39073] email incorrect handling of crlf in Address objects.
R. David Murray added the comment: Thanks! -- ___ Python tracker <https://bugs.python.org/issue39073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39073] email incorrect handling of crlf in Address objects.
R. David Murray added the comment: New changeset 614f17211c5fc0e5b828be1d3320661d1038fe8f by Ashwin Ramaswami in branch 'master': bpo-39073: validate Address parts to disallow CRLF (#19007) https://github.com/python/cpython/commit/614f17211c5fc0e5b828be1d3320661d1038fe8f -- ___ Python tracker <https://bugs.python.org/issue39073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39966] mock 3.9 bug: Wrapped objects without __bool__ raise exception
R. David Murray added the comment: My guess is that it isn't so much that __bool__ is special, as that the evaluation of values in a boolean context is special. What you have to do to make a mock behave "correctly" in the face that I'm not sure (I haven't investigated). And I might be wrong. -- ___ Python tracker <https://bugs.python.org/issue39966> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39073] email incorrect handling of crlf in Address objects.
R. David Murray added the comment: Thanks for the PR. I've made some review comments. -- ___ Python tracker <https://bugs.python.org/issue39073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27793] Double underscore variables in module are mangled when used in class
R. David Murray added the comment: You are welcome to open a doc-enhancement issue for the global docs. For the other, as noted already if you want to advocate for a change to this behavior you need to start on python-ideas, but I don't think you will get any traction. Another possible enhancement you could propose (in a new issue) is to have the global statement check for variables that start with '__' and do something appropriate such as issue a warning...although I don't really know how hard that would be to implement. -- ___ Python tracker <https://bugs.python.org/issue27793> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39771] EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output.
R. David Murray added the comment: I actually agree: if most (by market share) MUAs handle the RFC-incorrect parameter encoding style, and a significant portion does not handle the RFC correct style, then we should support the de-facto standard rather than the official standard as the default. I just wish Microsoft would write better software :) If on the other hand it is only microsoft out of the big market share players that is broken, I'm not sure I'd want it to be the default. But we could still support it optionally. So yeah, we could have a policy control that governs which one is actually used. So this is a feature request, and ideally should be supported by an investigation of what MUAs support what, by market share. And there's another question: does this only affect the filename parameter, or is it all MIME parameters? I would expect it to be the latter, but someone should check at least a few examples of that to be sure. -- stage: -> needs patch title: EmailMessage.add_header doesn't work -> EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output. type: behavior -> enhancement ___ Python tracker <https://bugs.python.org/issue39771> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39793] make_msgid fail on FreeBSD 12.1-RELEASE-p1 with different domains
R. David Murray added the comment: I don't object to this patch, but that sure looks like a broken system. -- ___ Python tracker <https://bugs.python.org/issue39793> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39757] EmailMessage bad encoding for international domain
R. David Murray added the comment: This is not actually a duplicate of 11783. Rereading (parts of) that issue, we decided we currently have no good way to do automatic conversion between unicode and internationalized domains, so the user of the library has to do it themselves. This means that the bug *here* is that the new email API is *wrongly* encoding the non-ascii in the domain by using an encoded word. I'm surprised at that; I thought I'd guarded against it. What should be happening here is that an error should be raised when that header is set (or possibly when it is accessed/serialized, but when set would be better I think) saying that there is non-ascii in the domain part. -- resolution: duplicate -> stage: resolved -> needs patch status: closed -> open superseder: email parseaddr and formataddr should be IDNA aware -> title: EmailMessage wrong encoding for international domain -> EmailMessage bad encoding for international domain ___ Python tracker <https://bugs.python.org/issue39757> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39771] EmailMessage.add_header doesn't work
R. David Murray added the comment: Since Outlook is one of the mailers that generates the non-RFC-compliant headers, it doesn't surprise me all that much that it can't interpret the RFC compliant headers correctly. I'm not sure there is anything we can do here. I suppose someone could do a survey of mail clients and document which ones can handle which style of parameter encoding. If it turns out more handle the "wrong" way than handle the "right" way, we could consider adopting to the de-facto standard, although I won't like it much :) (There is also a possibility there is a bug in our RFC compliance, but this is the first problem report I've seen.) -- ___ Python tracker <https://bugs.python.org/issue39771> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39771] EmailMessage.add_header doesn't work
R. David Murray added the comment: The legacy API appears to be using an RFC-incorrect (but common) encoded-word encoding, while the new API is using the RFC-compliant MIME-parameter encoding (% encoding). Which email client are you using? -- ___ Python tracker <https://bugs.python.org/issue39771> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39771] EmailMessage.add_header doesn't work
R. David Murray added the comment: Actually, given that the contentmanager does accept a charset parameter for text content, it does seem reasonable to treat this as a bug. But as I said fixing it may not be trivial. -- ___ Python tracker <https://bugs.python.org/issue39771> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39771] EmailMessage.add_header doesn't work
R. David Murray added the comment: I think you are saying that you want the charset in the encoded filename to be GBK rather than utf-8? utf-8 should certainly display correctly in your email client, though, so if it is not there is something else going wrong. As far as the 3 tuple not working to set the charset...I believe what is happening there is that a header created by the application gets "refolded" on serialization, and refolding doesn't keep the existing charset, it converts everything to utf-8. This is an intentional part of the design: the library handles the gory details of MIME and uses utf-8 as the charset for application created content. It is actually an accident of the implementation that the tuple form of the filename is even accepted; you will note that it is *not* documented in the contentmanager docs. It wouldn't be crazy to ask for this as a feature, and it could even be treated as a bug that it doesn't work if we want to, but it may not be easy to "fix", because it goes against the design philosophy of the new API. -- ___ Python tracker <https://bugs.python.org/issue39771> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39384] Email parser creates a message object that can't be flattened as bytes.
R. David Murray added the comment: message_from_bytes -- ___ Python tracker <https://bugs.python.org/issue39384> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39384] Email parser creates a message object that can't be flattened as bytes.
R. David Murray added the comment: If we can get an actual reproducer using message_as_bytes I'd feel more comfortable with the fix. I worry that there is some other bug this is exposing that should be fixed instead. -- ___ Python tracker <https://bugs.python.org/issue39384> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10740] sqlite3 module breaks transactions and potentially corrupts data
R. David Murray added the comment: Please open a new issue for this question. -- ___ Python tracker <https://bugs.python.org/issue10740> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24337] Implement `http.client.HTTPMessage.__repr__` to make debugging easier
R. David Murray added the comment: Thanks for the PR, but I've noted an issue on the review. In any case we should agree on what goes in the repr here in this issue before actually implementing anything. -- ___ Python tracker <https://bugs.python.org/issue24337> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39309] Please delete my account
R. David Murray added the comment: AFAIR it can only be done using the roundup command line on the server. -- nosy: +ezio.melotti ___ Python tracker <https://bugs.python.org/issue39309> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39384] Email parser creates a message object that can't be flattened as bytes.
R. David Murray added the comment: Since you parsed it as a string it is not really legitimate to serialize it as bytes. (That will work if the input message only contains ascii, but not if it contains unicode). You'll get the same error if you replace the garbage with the "’". Using errors=replace is not crazy, but it hides the actual problem. Let's see what other people think :) In theory you could "fix" this by encoding the unicode using the charset specified by the container. I have no idea how complicated it will be do that, and it would be a new feature: parsing strings is specified to only work with ASCII input, currently. I put "fix" in quotes, because even if you make text parts like this example work, you still can't handle non-text 8bit mime parts. Is it worth doing anyway? Really, message_as_string and friends should just be avoided entirely, maybe even deprecated. -- ___ Python tracker <https://bugs.python.org/issue39384> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23434] support encoded filename in Content-Disposition for HTTP in cgi.FieldStorage
R. David Murray added the comment: Are you saying there is no (http) RFC compliant way to fix this, or no way to fix it with the email library parsers? If the latter, the library is pretty flexible and for internal stdlib use it would probably be permissible to directly call methods in the internal parsing module, if those would be useful. I haven't re-read the issue to reload my brain, so this question may be off point (except for the first clause of the question). -- ___ Python tracker <https://bugs.python.org/issue23434> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23147] Possible error in _header_value_parser.py
R. David Murray added the comment: Thanks for the ping. Whether or not Serhiy's patch fixed the original problem, the algorithm rewrite has happened so this issue is no longer relevant in any case. -- stage: test needed -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue23147> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: I don't see the change to the test in the PR. Did you miss a push or is github doing something wonky with the review? (I haven't used github review in a while and I had forgetten how hard it is to use...) -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39131] signing needs two serialisation passes
R. David Murray added the comment: Ideally this should be exposed by extending the content manager. Instantiating MIME classes is part of the old API, not the new. The code in the PR may well be correct, but class should be hidden from the normal user (of the new API). I'm not sure what the best way to specify the signing function will be, but I'm guessing a new keyword parameter in the content API. Note that the current content management API is more of a framework than a fully worked out system, so figuring out the best way to add this may require some design discussion. -- ___ Python tracker <https://bugs.python.org/issue39131> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: One more tweak to the test and we'll be good to go. -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39073] email incorrect handling of crlf in Address objects.
R. David Murray added the comment: Hmm. Yes, \r\n should be disallowed in the arguments to Address. I thought it already was, so that's a bug. That bug produces the other apparent bug as well: because the X: was treated as a separate line, the previous header did not need double quotes so they are no longer added. So there's no 3.8 specific bug here, but there is a bug. -- title: email regression in 3.8: folding -> email incorrect handling of crlf in Address objects. ___ Python tracker <https://bugs.python.org/issue39073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent
R. David Murray added the comment: All of which isn't to discount that you might have a found a bug, by the way, if you want to investigate further :) -- ___ Python tracker <https://bugs.python.org/issue39071> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39071] email.parser.BytesParser - parse and parsebytes work not equivalent
R. David Murray added the comment: The problem is that you are starting with different inputs. unicode strings and bytes are different things, and so parsing them can produce different results. The fact of that matter is that email messages are defined to be bytes, so parsing a unicode string pretending it is an email message is just asking for errors anyway. The string parsing methods are really only provided for backward compatibility and historical reasons. I thought this was clear from the existing documentation, but clearly it isn't :) I'll review a suggested doc change, but the thing to explain is not that parse and parsebytes might produce different results, but that parsing email from strings is not a good idea and will likely produce unexpected results for anything except the simplest non-mime messages. Note: the reason you got different checksums might have had to do with line ends, depending on how you calculated the checksums. You should also consider using get_content and not get_payload. get_payload has a weird legacy API that doesn't always do what you think it will, and that might be another source of checksum issues. But really, parsing a unicode representation of a mime message is just likely to be buggy. -- ___ Python tracker <https://bugs.python.org/issue39071> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: In general your solution looks good, just a few naming comments and an additional test request. -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: The example you want to look at is get_unstructured. That shows both lookback and modification of the parse tree to handle the whitespace between encoded words. -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: And you are right that this is a very common bug in email programs. So common that I suspect the RFC folks will eventually have to accept it as a de-facto standard. So we do need to support it in the python email library. -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: Yes, google should fix their bug. However, the python email package tries very hard to interpret even RFC-non-compliant emails when there is a way to do so. As I said, the package already tries to interpret headers such as google is generating, it's just that there is a bug in that interpretation: it is keeping the blank between then encoded words when it should not be. That bug can be fixed, in get_raw_encoded_word and/or get_parameter, in email._header_value_parser. -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: That header is *completely* non-RFC compliant. If gmail generated that header there is something very wrong in google-land :( The RFC compliant formatting for that header looks like this: Content-Disposition: attachment; filename*=utf-8''Schulbesuchsbest%C3%A4ttigung.pdf You will note that this is nothing like encoded word format. Encoded words are not valid inside quoted strings, and quoted strings can't be used in mime header attributes if there are non-ascii characters involved. Nor can encoded words. Now, all that said, there is an obvious rule that can be followed to understand what that header is trying to convey, and the current parser already implements most of it (you will find comments about it in the parser, as well as defects being registered). So, a patch to _header_value_parser to fix the error recovery will be accepted. I've looked at the code to remind myself, but not deeply enough to be *sure* where the changes need to be made. There are two possibilities I see off the bat (and both may need fixing): get_bare_quoted_string and get_parameter. Either one or both of those may be forgetting that whitespace between encoded words should be dropped. -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39040] Wrong attachement filename when mail mime header was too long
R. David Murray added the comment: Thanks for the report. Can you provide an example that reproduces the problem? Per the RFC, lines may be broken before whitespace in certain places in certain headers, but that does not make the whitespace go away. Only the crlf sequence is removed when unfolding the header, per the RFC, so your proposed fix is incorrect. I suspect your example header is invalid, and the question will then become is there some sort of Postel-style error recovery we can and want to do in the function that parses the content-disposition header. -- ___ Python tracker <https://bugs.python.org/issue39040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38625] SpooledTemporaryFile does not seek correctly after being rolled over
R. David Murray added the comment: The docs currently say "The returned object is a file-like object whose _file attribute is either an io.BytesIO or io.StringIO object (depending on whether binary or text mode was specified) or a true file object, depending on whether rollover() has been called." The fact that taking an iterator gets you whatever the *current* _file object is is implied by that but not made explicit. A doc update to make that explicit would probably be appropriate. -- ___ Python tracker <https://bugs.python.org/issue38625> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38698] While parsing email message id: UnboundLocalError
R. David Murray added the comment: Actually, the success path there should also check that value is empty, and if it is not register a defect for that as well. -- ___ Python tracker <https://bugs.python.org/issue38698> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38672] mimetypes.init() fails if no access to one of known files
R. David Murray added the comment: I haven't looked at this in detail, but here are my general thoughts: I think it would be reasonable to expect that the module would function even if the file permissions are screwed up, similar to how unix commands that try to read .netrc will (try to) function even if its permissions are wrong. I would, however, expect the module to emit a warning in that case. I'm of two minds about the behavior when the caller specifies filenames explicitly. I could see that going either way, but I lean slightly toward making the behavior consistent. While the programmer might appreciate the traceback, the user of the program would probably appreciate the "try to keep going" behavior, since the filenames provided will often be in the same class of "standard defaults" as the existing well known files are, just in the context of that particular application. But like I said, that is just a lean, and I could go the other way on this as well :) I haven't looked at the isflie issue, but it seems reasonable that if the path exists we should make sure it is a file before reading it...but perhaps readfp will effectively do that? Write a test and see what happens :) I don't know whether to call this change a bug fix or a feature, so I guess we'd default to feature unless someone can tilt the balance with an argument :) -- ___ Python tracker <https://bugs.python.org/issue38672> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38698] While parsing email message id: UnboundLocalError
R. David Murray added the comment: More tests are always good :) The "correct" solution here (as far as I remember, its has been a while since I've had time to even looked at the _header_value_parser code) would be to add a new 'invalid-msg-id' token, and do this: message_id = MessageID() try: token, value = get_msg_id(value) message_id.append(token) except HeaderParseError as ex: message_id = InvalidMessageID(value) message_id.defects.append(InvalidHeaderDefect( f"Invalid msg_id: {ex}")) return message_id -- ___ Python tracker <https://bugs.python.org/issue38698> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37532] email.header.make_header() doesn't work if any `ascii` code is out of range(128)
R. David Murray added the comment: Right, and the python email package fully supports non ascii: >>> msg = EmailMessage() >>> msg['Subject'] = "Panamá- Casco Antiguo" >>> bytes(msg) b'Subject: =?utf-8?q?Panam=C3=A1-?= Casco Antiguo\n\n' >>> str(msg) 'Subject: Panamá- Casco Antiguo\n\n' >>> msg['subject'] 'Panamá- Casco Antiguo' make_header also supports non-ascii, you just have to tell it what charset you want to use. Like I said, make_header is part of the *legacy* API, and it really is a pain to use. That's why we wrote the new API. -- ___ Python tracker <https://bugs.python.org/issue37532> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37532] email.header.make_header() doesn't work if any `ascii` code is out of range(128)
R. David Murray added the comment: The input header is not valid (non-ascii is not allowed in headers), so you shouldn't expect make_header to do anything sensible. Note that this is the legacy API, which is a toolkit and does not hold your hand when it comes to RFC compliance. Aside from any other concerns, this is long standing behavior (it is the same in python2), and it doesn't make sense to change the behavior of a legacy API. -- resolution: -> not a bug stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue37532> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37491] IndexError in get_bare_quoted_string
Change by R. David Murray : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue37491> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37492] should email.utils.parseaddr treat a@b. as invalid email ?
R. David Murray added the comment: Right, those absolutely are valid addresses. A resolver will normally look up a name with an internal dot first as if it were an FQDN, but if it does so and does not get an answer it will then look it up again as a "local" address (appending in turn the strings from the 'search' directive in resolv.conf or equivalent) *if* it does not end in a final dot. If it does end in a final dot, no further lookup as local is done. While it isn't *normal* to send email to a TLD using a trailing dot, it is *legal*. In theory the address 'postmaster@com.' ought to be a valid email address (I doubt that it actually is, though). On the other hand, I will be very surprised if *all other* TLDs are without valid email addresses, especially the new ones. It is also easy to imagine an environment using email with private single label domain names using trailing dots specifically to suppress appending of search domains for sandboxing reasons. Thus the email library must support it as valid, both for RFC reasons and for practical reasons. -- ___ Python tracker <https://bugs.python.org/issue37492> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37482] Email address display name fails with both encoded words and special chars
R. David Murray added the comment: The display name is a phrase, and a phrase is a sequence of words, and a word is either a quoted string or an atom. So it is legal to mix quoted strings and encoded words in a display name. I'd vote to do whichever one is easier to implement :) (I haven't looked at your PR yet and unfortunately my time is limited :( -- ___ Python tracker <https://bugs.python.org/issue37482> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37482] Email address display name fails with both encoded words and special chars
R. David Murray added the comment: FYI, it would have been most helpful if you had posted your example in the issue text instead of as an attached file, as it explains the problem better than your text does :) Here is a minimal reproducer: >>> m = EmailMessage(policy=strict) >>> m['From'] = '"Foo Bar, España" ' >>> bytes(m) b'From: Foo Bar, =?utf-8?q?Espa=C3=B1a?= \n\n' This serialization of the header is, as you say, invalid. Either the comma should be encoded, or the "Foo Bar," should be in quotes. -- ___ Python tracker <https://bugs.python.org/issue37482> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37357] mbox From line wrongly detected
R. David Murray added the comment: This problem is the whole reason "mangle_from" exists in the email library... -- ___ Python tracker <https://bugs.python.org/issue37357> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31445] Index out of range in get of message.EmailMessage.get()
R. David Murray added the comment: Note that the reporter indicated that the message was an instance of EmailMessage (the new API). You'd need to use policy-default to get that using message_from_string. But yes, this was fixed in another issue. -- stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue31445> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32179] Empty email address in headers triggers an IndexError
R. David Murray added the comment: BareQuotedString implies the new API is being used, though that was not made clear in the report. However, unlike the other recently closed issue, this one was in fact fixed (and I have a vague memory of reviewing the PR): >>> m = message_from_string('ReplyTo: ""', policy=default) >>> m['ReplyTo'] '""' -- ___ Python tracker <https://bugs.python.org/issue32179> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32178] Some invalid email address groups cause an IndexError instead of a HeaderParseError
R. David Murray added the comment: The fact that the original report mentions HeaderParserError implies that the new API is being used, though the report didn't make that clear. The problem still exists: >>> m = message_from_string("To: :Foo >>> \n\n", policy=default) >>> m['To'] Traceback (most recent call last): File "", line 1, in File "/home/rdmurray/python/p38/Lib/email/message.py", line 391, in __getitem__ return self.get(name) File "/home/rdmurray/python/p38/Lib/email/message.py", line 471, in get return self.policy.header_fetch_parse(k, v) File "/home/rdmurray/python/p38/Lib/email/policy.py", line 163, in header_fetch_parse return self.header_factory(name, value) File "/home/rdmurray/python/p38/Lib/email/headerregistry.py", line 602, in __call__ return self[name](name, value) File "/home/rdmurray/python/p38/Lib/email/headerregistry.py", line 197, in __new__ cls.parse(value, kwds) File "/home/rdmurray/python/p38/Lib/email/headerregistry.py", line 343, in parse groups.append(Group(addr.display_name, File "/home/rdmurray/python/p38/Lib/email/_header_value_parser.py", line 315, in display_name return self[0].display_name File "/home/rdmurray/python/p38/Lib/email/_header_value_parser.py", line 382, in display_name return self[0].display_name File "/home/rdmurray/python/p38/Lib/email/_header_value_parser.py", line 564, in display_name if res[0].token_type == 'cfws': IndexError: list index out of range -- resolution: out of date -> status: closed -> open ___ Python tracker <https://bugs.python.org/issue32178> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19645] decouple unittest assertions from the TestCase class
R. David Murray added the comment: "But - what are we solving for here?" I'll tell you what my fairly common use case is. Suppose I have some test infrastructure code, and I want to make some assertions in it. What I invariably end up doing is passing 'self' into the infrastructure method/class just so I can call the assert methods from it. I'd much rather be just calling the assertions, without carrying the whole test object around. It *works* to do that, but it bothers me every time I do it or read it in code, and it makes the infrastructure code needlessly more complicated and slightly harder to understand/read. -- ___ Python tracker <https://bugs.python.org/issue19645> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27737] email.header.Header.encode() crashes with IndexError on spaces only value
R. David Murray added the comment: New changeset 0416d6f05a96e0f1b3751aa97abfffe6d3323976 by R. David Murray (Miss Islington (bot)) in branch '3.7': bpo-27737: Allow whitespace only headers encoding (GH-13478) (#13517) https://github.com/python/cpython/commit/0416d6f05a96e0f1b3751aa97abfffe6d3323976 -- ___ Python tracker <https://bugs.python.org/issue27737> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36520] Email header folded incorrectly
R. David Murray added the comment: Nevermind, I was testing with the wrong version of python. This bug was introduced somewhere after 3.4 :( >>> from email.message import EmailMessage >>> m = EmailMessage() >>> m['Subject'] = 'Hello Wörld! Hello Wörld! Hello Wörld! Hello Wörld!Hello >>> Wörld!' >>> bytes(m) b'Subject: Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?=\n Hello =?utf-8?=?utf-8?q?q=3FW=3DC3=3DB6rld!Hello=3F=3D_W=C3=B6rld!?=\n\n' -- ___ Python tracker <https://bugs.python.org/issue36520> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36520] Email header folded incorrectly
R. David Murray added the comment: Can you demonstrate the problem with an actual email object? header_store_parse is not meant to be called directly. -- ___ Python tracker <https://bugs.python.org/issue36520> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com