[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: I'm not going to have time to look at the PR for a couple days. I don't understand what the use case is for writing or appending with filenames in a non-UTF-8 encoding. At least in my experience, reading such files is rare, but I have never been asked to write one. The correspondents who send me zipfiles with the directory encoded in shift_jisx0213 are perfectly happy to read zipfiles with the directory encoded in UTF-8. If that is true for other users, then unzipping the file to a temporary directory with the appropriate --metadata-encoding, adding the required paths there, and zipping a new archive seems perfectly workable. In that case making this feature read-only makes the most sense to me. -- ___ Python tracker <https://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39673] Map errno==ETIME to TimeoutError
Stephen J. Turnbull added the comment: First, let me say I like Giampaolo's TimeoutExpired *much* better as the name for this kind of exception! But that ship has sailed. I don't understand Giampaolo's comment. If I understand the claim correctly, the problem is that people who should be catching some application-specific exception may be misled into catching TimeoutError instead, or into trying to get application-specific attributes from TimeoutError. But that ship sailed with the creation of TimeoutError. (We have a whole fleet sailing with this exception.) Unless Giampaolo is proposing to deprecate TimeoutError? I'm sympathetic ;-), but deprecation is a PITA and takes forever. If we're not going to deprecate, it seems to me that it's much more developer-friendly to catch ETIME with TimeoutError, as that seems very likely to be the expected behavior. It's true that even if Giampaolo changes TimeoutExpired to subclass TimeoutError, generic TimeoutError won't have .seconds. But if you catch a TimeoutExpired with TimeoutError, that instance *will* have .seconds, and if you try to get .seconds on generic TimeoutError, you'll get a different uncaught exception (AttributeError vs. TimeoutError), but that TimeoutError wouldn't have been handled by catching TimeoutExpired. I agree with Eric that people who were distinguishing OSError with .errno=ETIME from TimeoutError might be at risk, but I wouldn't do that: if I were going to be distinguishing particular OSErrors on the basis of errno (other than in "Unexpected OSError (errno = %d)" reporting style), I'd just catch OSError and do that. On the other hand, I might expect TimeoutError to catch ETIME. And Giampaolo says he's never seen either. I suppose the author of psutil would be as likely as anyone to have seen it! On net (unless we go the deprecation route) it seems that the convenience and "intuition" of adding ETIME to TimeoutError outweighs that risk. I wish there were somebody who was there at the creation of ETIME! -- nosy: +sjt ___ Python tracker <https://bugs.python.org/issue39673> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29352] provide the authorative source for s[i:j] negative slice indices (<-len(s)) behavior for standard sequences
Stephen J. Turnbull added the comment: I prefer Josh's wording. The important point to me is that >>> [1, 2][2:0] = "AB" [1, 2, "A", "B"] not an error or ["B", "A"] == [1, 2][2:0:-1]. I think too much talk about the endpoints obscures this important fact. (I think I'd like it to be an error, since the interpretation of s[2:0] = t could reasonably be any of s[0:0] = t, s[1:1] = t, or s[2:2] = t, but I haven't thought carefully enough yet, and "backward compatibility".) Note: Josh's wording is already used in 3.7 (https://docs.python.org/dev/library/stdtypes.html#common-sequence-operations, as of the timestamp of this message). I didn't check if it's been backported. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue29352> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30138] Incorrect documentation of replacement of slice of length 0
Stephen J. Turnbull added the comment: Sorry, I just realized this note only applies to slices with a stride (k in i:j:k). Closing. -- stage: -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue30138> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30138] Incorrect documentation of replacement of slice of length 0
New submission from Stephen J. Turnbull: In section 4.6.3. "Mutable Sequence Types" of current documentation, Note 1 to the table says "[iterable] t must have the same length as the slice it is replacing." This is incorrect in the case of extension: s[len(s):] = t according to the rest of the documentation, as well as experiment. -- assignee: docs@python components: Documentation keywords: easy messages: 292127 nosy: docs@python, sjt priority: normal severity: normal status: open title: Incorrect documentation of replacement of slice of length 0 type: behavior versions: Python 3.7 ___ Python tracker <http://bugs.python.org/issue30138> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: Thanks for followup! I was just about to write you, now that 3.6 is out. Season's Greetings! First, how do you propose to proceed with issue28115 ("use argparse for the ZipFile module")? If you expect to commit that first (I'm in no hurry for this patch, BTW, as long as it gets into 3.7 I'm happy), this issue should depend on it and use argparse too. I don't see any good reason for allowing non-UTF-8 encoding to a file open for writing, and a good reason (the ZipFile standard) for not allowing it. Certainly the CLI should not allow it, any more than it does now. At least in my experiments InfoZip and the default zip utilities on Windows and Mac DTRT with UTF-8 zipfiles, so there is no absolute need for writing nonconforming zipfiles. If you want to block on a convert-to-UTF-8 option I can do that (but I don't need it myself). (Note to self: if writing to existing zipfile is extension of existing file, need to prevent mixed encodings. Also warn about conversion.) I thought I checked that comments were decoded. Maybe that's only on the UTF-8 path? Or maybe I needed more coffee. (Hope so, that would be a messy problem if ASCII/Latin1 returns bytes and UTF-8 returns str!) I'll think about this. Yes, it's a backwards-compatibility issue so needs care. Would be weird if names are decoded but other metadata (comments) not, though. Surely someone would complain if they actually used comments? (I'm thinking maybe a compatibility break might be OK? With deprecation cycle?) I expect to check all execution paths accessing metadata and have a proposed patch by 12/31. I think I'm still short some tests, will check and write them if needed. -- ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28032] --with-lto builds segfault in many situations
Stephen J. Turnbull added the comment: FWIW, XEmacs has used a bit of m4 magic to make --with-* and --enable-* equivalent for 15 years, and nobody has ever complained. The autotools convention is a distinction without a difference, and confuses users when a program feature depends on an external library (especially where there are alternative implementations). -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue28032> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28102] zipfile.py script should print usage to stderr
Stephen J. Turnbull added the comment: CA pending (I have received PDF, but no star in tracker yet). -- ___ Python tracker <http://bugs.python.org/issue28102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28103] Style fix in zipfile.rst
New submission from Stephen J. Turnbull: Makes style of references to open modes 'r', 'a', ... more consistent. CA pending (I have received PDF, but no star in tracker yet). -- assignee: docs@python components: Documentation files: zipfile-doc-style messages: 276058 nosy: docs@python, sjt priority: normal severity: normal status: open title: Style fix in zipfile.rst versions: Python 3.7 Added file: http://bugs.python.org/file44595/zipfile-doc-style ___ Python tracker <http://bugs.python.org/issue28103> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28102] zipfile.py script should print usage to stderr
New submission from Stephen J. Turnbull: Pointed out by Serhiy Storchaka in a different context. -- components: Library (Lib) files: zipfile-errmsg keywords: patch messages: 276056 nosy: sjt priority: normal severity: normal status: open title: zipfile.py script should print usage to stderr versions: Python 3.7 Added file: http://bugs.python.org/file44594/zipfile-errmsg ___ Python tracker <http://bugs.python.org/issue28102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: Cleaned up a few loose ends while it's all fresh in mind. Will ping python-dev in 4-6 weeks for review for 3.7. Thanks to Serhiy for review. The current version of the patch is much improved over the initial submission due to his efforts. -- Added file: http://bugs.python.org/file44593/encoded-member-names-v3 ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: If you have a workaround that's available to nonprogrammers, I'd like to hear about it. I have found none, that's why I went to the trouble to put together a patch even though I knew that the odds of actually getting it in to Python 3.6 was very low -- my patch (or Sergey Dorofeev's, but that needs work to be applicable to trunk) does everything I've ever needed, so I suppose it would do for all the use cases so far posted (except umedoblock's encoding-guessing approach, but that can be handled by many 3rd-party encoding-guessing codecs, I think, and IMO should be). -- ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: Can't reply on Rietveld? Lost 2 hours work! Patch updated (encoded-member-names-v2), most changes accepted. Not happy about name change or default to cp437, I want this API to be hard to use and not be part of the normal process (utf-8 or cp437). Considering errors= argument, but that must default to 'strict' -- the problem this patch solves is zip utilities extracting to files with unreadable names, surrogateescape is more of the same. Two incomplete tests (assertRaisesRegex and capture main() stderr) still in progress, must do dayjob now. -- Added file: http://bugs.python.org/file44571/encoded-member-names-v2 ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: Re: wait for 3.7 if reviewers are busy, understood. N.B. Contributor agreement is now on file (I received the PDF from python.org already). Re: existing patches: My patch is very similar in the basic approach to Sergey Dorofeev's patch in issue10614. Main differences: (1) Sergey's patch treats the "encoding" parameter as a first class citizen with a default to cp437, whereas mine treats it as a special case defaulting to None, with utf-8 and cp437 getting special treatment as the standard encodings. Subtle point, but I like it this way. (2) My patch includes support for the argument in the __main__ script. (3) Sergey's patch misses one execution path in the current code so needs update before application. The Japanese patches by umedoblock are very Japanese-centric, and worse, they try to guess the encoding by the crude method of seeing what decodes successfully. They are not acceptable IMO. Aaargh. Just noticed the Japanese in test_zipfile.py. Will change it to use \u escapes soon. -- ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: I should have a contributor agreement form on file. Ned Deily suggested that I try to get this patch in before the 12 noon deadline Sept. 12, so here it is. I believe the patch is "safe" in the sense that its functionality needs to be explicitly enabled, and it should be very difficult to persuade it to inadvertantly write to any file. No existing execution paths should be changed at all. -- ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Stephen J. Turnbull added the comment: Suggested NEWS/whatsnew entry: Add a new *memberNameEncoding* argument to the ZipFile constructor, allowing :mod:`zipfile` to read filenames in non-conforming encodings from the zipfile as Unicode. This implementation assumes all member names have the same encoding. Motivation: There are applications in Japan that create zipfiles with directories containing filenames encoded in Shift JIS. There may be such software in other countries as well. As this is a violation of the Zip format definition, this library implements only an option to read such files. Done: (1) Add a memberNameEncoding argument to the main() function, which may be set from the command line with "--membernameencoding={codec}". This command line option may be used with -e or -l, but not -c or -t. There is no point to it in the latter, since the member names are not printed. (2) Add a memberNameEncoding argument to the ZipFile constructor. This is the only way to set it, so this is global to the ZipFile. (3) Add this attribute to repr. (4) Add a check that the mode is `read` in main() and in the ZipFile constructor, and if not invoke USAGE and exit or raise RuntimeError. (5) When retrieving member names in constructing ZipInfo instances, check if memberNameEncoding is set, and if so use it, unless the UTF-8 bit is set. In that case, obey the UTF-8 bit, as the specified encoding is surely user error. (6) Add a CODEC_USAGE message. (7) Update the docs (docstrings, library reference, NEWS). (8) Add tests: (a) List a zipfile's SJIS-encoded directory. (b) List a UTF-8-encoded directory and an ISO-8859-1-encoded directory as Shift-JIS. (c) Check that USAGE is invoked on attempts to write a zipfile in main(). (d) Check that an appropriate error is raised on attempts to write in other functions. Many other tests are run as well. ALL TESTS PASS. (9) Docs build without error. To do (?): (10) NEWS/whatsnew (11) Check relevant code paths are all covered by tests. (12) Review docs for clarity and organization. Not done: I don't think these are appropriate/needed at this time, but listed in case somebody thinks otherwise. (13) Add a subtype of RuntimeError (see 7d)? (14) Issue warning if both membernameencoding and utf-8 bit are set (see 4)? (15) Support InfoZip encoding extension mentioned in APPNOTE.TXT - .ZIP File Format Specification, v6.3.4. (16) Support per-member encodings (I think the zipfile standard permits, but not sure). -- keywords: +needs review status: pending -> open Added file: http://bugs.python.org/file44564/encoded-member-names ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28080] Allow reading member names with bogus encodings in zipfile
Changes by Stephen J. Turnbull : -- components: Library (Lib) keywords: patch nosy: sjt priority: normal severity: normal status: open title: Allow reading member names with bogus encodings in zipfile type: enhancement versions: Python 3.6 ___ Python tracker <http://bugs.python.org/issue28080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27582] Mispositioned SyntaxError caret for unknown code points
Stephen J. Turnbull added the comment: I still think the easiest thing to do would be to make all non-ASCII characters instances of "invalid_character_token", self-delimiting in the same way that operators are. That would automatically point to exactly the right place in the token stream, and requires zero changes to the error handling code. I don't have time to look at the code, but I suspect that you could handle this exactly the same way that ? and $ are handled, and maybe even use the same token type. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue27582> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27257] get_addresses results in traceback with a valid? header
Stephen J. Turnbull added the comment: OK, I can reproduce now. $ python3.5 Python 3.5.0 (v3.5.0:374f501f4567, Sep 17 2015, 17:04:56) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import email >>> with open(b'lkml-exception.mail', mode = 'r') as f: ... msg = email.message_from_file(f, policy=email.policy.SMTP) ... >>> msg.get_all('to') Traceback (most recent call last): and (except for a slight skew in line-numbering) the rest is the same as the tail of the OP. The crucial part is the policy=email.policy.SMTP argument, and evidently what's happening is that the parser assumes that the local-part of the addr-spec is non-empty. RFC5322 does permit a quoted-string to be empty, so this is a bug in the email module's parser. (I don't have a patch,sorry.) Aside: although strictly speaking it's hold-your-nose-and-avert-your-eyes legal according to RFC 5322, RFC 5321 (SMTP) does say: While the above definition for Local-part is relatively permissive, for maximum interoperability, a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form[...]. I don't see a good reason for the usage in the test case, so I'd call this nonconformant to RFC 5321. I think the right way to handle it is to register a defect but let the parse succeed. -- ___ Python tracker <http://bugs.python.org/issue27257> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27257] get_addresses results in traceback with a valid? header
Stephen J. Turnbull added the comment: In Python 3.5, both entering the problematic header by hand with a trivial body and using email.message_from_string to parse it, and calling email.message_from_file on lkml-exception.mail, produce an email.message.Message with no defects and no traceback. Without access to mail_filter.py, it's not clear what the defect might be. -- nosy: +sjt stage: -> test needed type: -> behavior ___ Python tracker <http://bugs.python.org/issue27257> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24682] Add Quick Start: Communications section to devguide
Stephen J. Turnbull added the comment: I just reviewed again, and I agree it's ready for merge. I don't see any immediate need to add more. Unfortunately, I'm not a committer. -- ___ Python tracker <http://bugs.python.org/issue24682> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24682] Add Quick Start: Communications section to devguide
Stephen J. Turnbull added the comment: I tend to disagree with Ezio about a FAQ for general questions. A pointer to appropriate alternatives for off-topic posts in the Mailman listinfo descriptions of the various list (which can be copied into the devguide, or linked from there) will be sufficient for people who actually read such things before posting. OTOH, once there already is a misdirected post, I feel it's appropriate to say "This post is off-topic here because this list is for development of Python itself, not developing applications with Python. Posts like yours are ignored by almost all participants. You will get help (possibly better than you could get on this list) on pytho...@python.org." Adding a pointer to a FAQ which just repeats the same thing is browbeating IMO. It's not like we don't have several people who have macros to say the above (and more politely than I did) who typically respond within hours to off-topic posts. What more could a FAQ say? Of course this needs to be on-list so that the poster (who usually is a little feckless rather than deliberately abusive) doesn't get spammed, and so that the multiple volunteers who handle these posts don't duplicate each other. I personally would like to see a guideline to participants that if they want to offer advice on the question itself to people, that they do so off-list. Whatever one's opinion on the utility of offering advice in response to an off-topic post, such advice is as off-topic as the question that elicits it. -- ___ Python tracker <http://bugs.python.org/issue24682> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24682] Add Quick Start: Communications section to devguide
Stephen J. Turnbull added the comment: If the mailing list code of conduct is to be fleshed out, Paul Moore's post is a good place to start IMO: https://mail.python.org/pipermail/python-dev/2015-July/140872.html. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue24682> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18814] Add codecs.convert_surrogateescape to "clean" surrogate escaped strings
Stephen J. Turnbull added the comment: Please do not add the "rehandle" functions to codecs. They do not change the (duck-typed) representation of data while maintaining the semantics, they change the semantics of data while retaining the representation. I suggest a "validation" submodule of the unicodedata package, or perhaps a new "unicodeutils" package, for these functions, as well as those that just detect the surrogates, etc. Because they change the semantics of data they should be documented as potentially dangerous because they can't be inverted back to bytes without knowledge of the history of transformations they perform (and not even then in the case of the "replace" error handler). This matters in applications where the input bytes may have been digitally signed, for example. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue18814> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14983] email.generator should always add newlines after closing boundaries
Stephen J. Turnbull added the comment: Following OpenPGP convention is clearly optional (or maybe a SHOULD, but the word "elect" makes it a pretty weak SHOULD). RFC 2046 is a MUST, it's not a matter of "convention". The problem is that a parser that works forward in the message will swallow the terminating CRLF of the boundary of the signed multipart, and then not find a CRLF to introduce the boundary that separates the content from the signature. By MIME rules it will treat the signature (including the unrecognized boundary) as an epilogue, and ignore it. This is not at all special to multipart/signed. -- ___ Python tracker <http://bugs.python.org/issue14983> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14983] email.generator should always add newlines after closing boundaries
Stephen J. Turnbull added the comment: Seems this hasn't been resolved. I have to disagree with David's interpretation of RFC 2046. The definition of a boundary says that it is "terminated" with a CRLF. It also clarifies that the introducing CRLF is "conceptually part" of the boundary. Thus each boundary contains both the leading and the trailing CRLF. There is no exception for the final boundary that I can see. This implies that when two boundaries abut, they need to be separated by *two* CRLFs, the trailing CRLF on the ending boundary of the inner multipart and the leading CRLF on the next boundary (which might be a separator or the ending boundary) of the containing multipart. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue14983> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18891] Master patch for content manager addtion to email package.
Stephen J. Turnbull added the comment: I'm thinking this may be overengineering, but I may as well post it and find out for sure. :-) Is it worth encapsulating MIME types? They're "really" pairs as far as mail handling applications are concerned, but they have a string representation. So MacPorts 16:29$ python3.3 Python 3.3.2 (default, May 21 2013, 11:48:51) >>> from collections import namedtuple >>> class MIMEType(namedtuple('MIMETYPE', 'type subtype')): ... def __str__(self): ... return "{0}/{1}".format(self.type, self.subtype) ... >>> mt = MIMEType('text', 'plain') >>> str(mt) 'text/plain' >>> t, s = mt >>> print('type =', t, 'subtype =', s) type = text subtype = plain >>> Obviously there needs to be a constructor that handles the 'type/sub' represention. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue18891> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689
Stephen J. Turnbull added the comment: Yeah, hope is a good thing. But I've spent the last 20 years debugging an X11 application based on a Lisp interpreter, I save hope for fireflies, my dog, and my daughter these days. :-) To the OP: I don't follow Gentoo closely, but I have acquaintances who do. Between them and the occasional foray into the forums, I've gotten the impression that providing CFLAGS for optimization is associated with having hard-to-debug problems. They increase performance noticably only in a few applications. Python being a dynamic language, function calls and even variable references can be quite inefficient anyway. So I see no good reason to compile Python with aggressive CFLAGS, because it should be used only for moderately performance sensitive applications and as "glue code" and to provide UI. Instead, use them only for the specific applications that benefit (I suppose matplotlib *might* be one). Second, I tend to agree with the maintainers. The packages.env / pydebug.conf approach is the right thing for this kind of variant build. Third, you said you hoped to get better backtraces from --with-pydebug. That's a vain hope. Such options are intended to get better backtraces of C code from coredumps where the interpreter breaks down, not of Python code induced by Python exceptions caused by problems in user code. If you have trouble interpreting a backtrace, ask on python-l...@python.org or comp.lang.python (they mirror each other, you only need one). If, after understanding the backtrace, you have an idea for way to get a better backtrace in this case, you can suggest it on python-id...@python.org. Unfortunately, reporting "this backtrace is unintelligible, please improve it" as an RFE on the tracker is likely to get the reply "You're right, but we don't know how at this time. Patches welcome!" But you could try that if all else fails. -- ___ Python tracker <http://bugs.python.org/issue18843> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689
Stephen J. Turnbull added the comment: OK, I backed off the aggressive CFLAGS/CXXFLAGS to " -ggdb -pipe", and ran "emerge =dev-lang/python-2.7.5-r1" *once* each with and without the 'EXTRA_ECONF="--with-pydebug"' flag. Compiled with GCC 4.7.3. No crash, same test results as described previously for GCC 4.6.4. If you have other suggestions, let me know. -- ___ Python tracker <http://bugs.python.org/issue18843> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689
Stephen J. Turnbull added the comment: I did "emerge =dev-lang/python-2.7.5-r1" *twice* with the environment configuration described in msg196520, then *once* with it disabled because one of the cases you described was when you tried to revert to a non-debug Python. (Besides, I am willing to risk your crash while I'm watching for it, but not a time bomb that will go off when I'm on deadline :-). All builds succeeded and all passed the test suite. Here's how the debug build describes itself: == CPython 2.7.5 (default, Sep 1 2013, 00:59:02) [GCC 4.6.4] == Linux-3.9.0-x86_64-Dual_Core_AMD_Opteron-tm-_Processor_265-with-gentoo-2.2 little-endian The test suite ran uneventfully (with a few DeprecationWarnings) except for this: 6 skips unexpected on linux2: test_bsddb test_bsddb3 test_tcl test_tk test_ttk_guionly test_ttk_textonly but I suppose that is expected on Gentoo. If any of those modules (bsddb, tcl, tk) are built into your Python, a problem in one of those might be the culprit. Oh, damn. I just reread the whole thread. For some reason I thought you were using gcc 4.6.4, but now I see you report 4.7.3. OK build with 4.7.3 and your flags (also restore the --with-pydebug config): # export CFLAGS=" -ggdb -pipe -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx -maes -mpclmul -mpopcnt" # export CXXFLAGS="${CFLAGS}" # export CC=gcc-4.7.3 and we crash (from make output) immediately after linking ./python: x86_64-pc-linux-gnu-ranlib libpython2.7.a gcc-4.7.3 -pthread -Wl,--hash-style=gnu -Wl,-O1 -Wl,--as-needed -L. -Xlinker -export-dynamic -o python \ Modules/python.o \ -L. -lpython2.7 -lpthread -ldl -lutil -lm LD_LIBRARY_PATH=/var/tmp/portage/dev-lang/python-2.7.5-r1/work/x86_64-pc-linux-gnu: ./python -E -S -m sysconfig --generate-posix-vars make: *** [pybuilddir.txt] Illegal instruction However, I'm pretty sure this is due to my hardware not liking your -m flags, not the crash you reported. I'll try backing those flags out, but if anybody has a suggestion for the most aggressive set similar to yours, I'd appreciate it. But first this process is going to go sleep(25200). -- ___ Python tracker <http://bugs.python.org/issue18843> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689
Stephen J. Turnbull added the comment: I have a gentoo host, but I'd like to know how did the OP get a debug Python in the first place? The ebuild for python 2.7.5-r1 doesn't say anything about debug options. "How" would preferably include information about the C compiler used, etc. If there's no information, I can probably just hack --with-pydebug into a local ebuild, but I have something like 7 C compilers installed, I'd really like a good guess at the right one. Also, did he add any optimization flags etc when building the debug Python? (ebuild = the emerge configuration file that describes the build and install process for a package. In "2.7.5-r1" the "r1" indicates the first revised ebuild for the same upstream version.) -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue18843> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18606] Add statistics module to standard library
Stephen J. Turnbull added the comment: A few small comments and nits. 1. I'm with the author on the question of a sum function in this module. The arguments that builtin sum isn't accurate enough, and neither is math.fsum for cases where all data is of infinite precision, are enough for me. 2. A general percentile function should be high on the list of next additions. A substantive question: 3. Can't add_partial be used in the one-pass algorithms? Several typos and suggested style tweaks: 4. I would find the summary more readable if grouped by function: add_partial, sum, StatisticsError; mean, median, mode; pstdev, pvariance, stdev, variance. Maybe I'd like it better if the utilities came last. IMO YMMV, of course. 5. In the big comment in add_partial, "the inner loop" is mentioned. Indeed this is the inner loop in statistics.sum, but there's only one loop in add_partial. 6. In the Limitations section of sum's docstring it says "these limitations may change". Is "these limitations may be relaxed" what is meant? I would hope so, but the current phrasing makes me nervous. 7. In sum, there are two comments referring to the construct "type(total).__float__(total)", with the first being a forward reference to the second. I would find a single comment above the "isinstance(total, float)" test more readable. Eg, """ First, accumulate a non-float sum. Until we find a float, we keep adding. If we find a float, we exit this loop, convert the partial sum to float, and continue with the float code below. Non-floats are converted to float with 'type(x).__float__(x)'. Don't call float() directly, as that converts strings and we don't want that. Also, like all dunder methods, we should call __float__ on the class, not the instance. """ 8. The docstrings for mean and variance say they are unbiased. This depends on the strong assumption of a representative (typically i.i.d.) sample. I think this should be mentioned. 9. Several docstrings say "this function should be used when ...". In fact the choice of which function to use is somewhat delicate. My personal preference would be to use "may" rather than "should." 10. In several of the mode functions, the value is a sorted sequence. The sort key should be specified, because it could be the data value or the score. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue18606> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10686] email.Generator should use unknown-8bit encoded words for headers with 8 bit data
Stephen J. Turnbull added the comment: I agree with you that according to RFC1428, use of unknown-8bit is implicitly recommended. However, note that the RFC itself is not standards-track. I agree with your interpretation that in this context the email module should be considered a gateway. I think it is certainly best to convert to MIME words, as you say. However, if there isn't already, maybe there should be an option to bounce such headers back to the user? That is, in an interactive application this should be an error. Of course we should help the user by allowing and documenting (perhaps even defaulting to) whatever we choose for the unknown encoding. I don't recall ever seeing unknown-8bit in the wild. What I do see in the wild a lot, and specifically in Mailman moderation traffic, is simply "unknown". A quick google for "unknown-8bit" pulled up some old (2002) discussion of unknown-8bit causing problems for some MTAs. I didn't follow up to see what those were. I don't have time to do it myself today (but would be willing to help out if you can wait up to two weeks -- I have travel coming up), but I suggest checking for IANA registration of "unknown" and "unknown-8bit". -- ___ Python tracker <http://bugs.python.org/issue10686> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8863] Display Python backtrace on SIGSEGV, SIGFPE and fatal error
Stephen J. Turnbull added the comment: Re: msg124528 Yes, XEmacs installs a signal handler on what are normally fatal errors. (I don't know about GNU Emacs but they probably do too.) The handler has two functions: to display a Lisp backtrace and to output a message explaining how to report bugs (even including a brief introduction to the "bt" command in gdb. ;-) I personally have never found the Lisp backtrace useful, except that it can be used as a bug signature of sorts ("oh, I think I've seen this one before..."). However, I suspect this is mostly because in Emacs Lisp very often you don't have the name of the function in the backtrace, only a compiled code object. So in many cases it's almost no help in localizing the fault. Victor's patch does a lot better on this than XEmacs can, I suspect. The bug reporting message, OTOH, has been useful to us for the reasons people give for wanting the handler installed by default. We get more and better bug reports, often including C backtraces, from people who have never participated directly in XEmacs development before. (It also once served the function of inhibiting people from sending us core files. Fortunately, I don't think that happens much any more. :-) Occasionally a user will be all proud of themselves because "I never used gdb before," so I'm pretty sure that message is effective. Quite frequently we see the handler itself crash if there was memory corruption, but certainly it gives useful output well over half the time. So I want to back up Victor on those aspects. Finally, although our experience has be very positive, qnote that XEmacs is not an embeddable library, nor is there provision in the mainline versions for embedding other interpreters in XEmacs. So we've never had to worry about the issues that come with that. For more technical details, you could ask Ben Wing who put a lot of effort into the signal handling implementation, or Hrvoje Niksic (sorry, no address offhand) who posts on python-dev occasionally. (I don't know if Hrvoje ever worked on the signal handlers, and he hasn't worked on XEmacs for years, but he was more familiar with internals than me then, and might very well still remember more than I ever knew. :-) I don't think either will disagree with my general statements above, though. -- ___ Python tracker <http://bugs.python.org/issue8863> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally
Changes by Stephen J. Turnbull : -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue9873> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6315] locale._build_localename(locale.getdefaultlocale()) returns 'C.mac-roman'
New submission from Stephen J. Turnbull : Which causes the locale machinery to spit exceptions, and the program to die, usually (eg, hg). This manifests naturally on an Intel Mac, Mac OS X 10.5.7, but the problem behavior is in _build_localename. When called as _build_localename((None,'any_string')) it returns 'C.any_string'. I don't know of any system that supports anything but the POSIX portable character set in the C/POSIX locale, so this is clearly wrong. I suggest that when the first component of the argument is None, the second component should be ignored. Probably my Mac is misconfigured, but I think this is still a bug that should be fixed. Observed in all of 2.5.4, 2.6.2, and 3.0.1 (vanilla MacPorts builds). References: It's possible this is related to issue1699853, issue1176504, issue504219, but I don't think fixing this will help with those issues. It is not related to issue3067. -- components: Library (Lib) messages: 89537 nosy: sjt severity: normal status: open title: locale._build_localename(locale.getdefaultlocale()) returns 'C.mac-roman' type: behavior versions: Python 2.5, Python 2.6, Python 3.0 ___ Python tracker <http://bugs.python.org/issue6315> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5252] 2to3 should detect and delete import of removed statvfs module
Stephen J. Turnbull added the comment: Benjamin Peterson writes: > Hmm. 2to3 doesn't currently mess with the stat module and os.stat the > more common function. Also the new interface (attributes on the objects > returned) has been around since 2.2. So what? You *can't* import a nonexistent module, so the import statement should be removed to save the programmer the trouble. ___ Python tracker <http://bugs.python.org/issue5252> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2899] Fixers find, rfind, etc in 'string' module
Stephen J. Turnbull added the comment: Maybe 2to3 could get a --pedantic or even an --annoying option? I agree that it should be noisy about removed features even if actually fixing this kind of thing would be hard to do reliably. -- nosy: +sjt ___ Python tracker <http://bugs.python.org/issue2899> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5252] 2to3 should detect and delete import of removed statvfs module
New submission from Stephen J. Turnbull : It should also try to convert stuff like from statvfs import F_BAVAIL, F_FRSIZE status = os.statvfs(directory) available = status[F_BAVAIL]/((1024*1024)/status[F_FRSIZE] -- components: 2to3 (2.x to 3.0 conversion tool) messages: 81959 nosy: sjt severity: normal status: open title: 2to3 should detect and delete import of removed statvfs module type: feature request versions: Python 3.0 ___ Python tracker <http://bugs.python.org/issue5252> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com