[issue28080] Allow reading member names with bogus encodings in zipfile

2022-03-23 Thread Stephen J. Turnbull


Stephen J. Turnbull  added the comment:

I'm not going to have time to look at the PR for a couple days.

I don't understand what the use case is for writing or appending with filenames 
in a non-UTF-8 encoding.  At least in my experience, reading such files is 
rare, but I have never been asked to write one.  The correspondents who send me 
zipfiles with the directory encoded in shift_jisx0213 are perfectly happy to 
read zipfiles with the directory encoded in UTF-8.

If that is true for other users, then unzipping the file to a temporary 
directory with the appropriate --metadata-encoding, adding the required paths 
there, and zipping a new archive seems perfectly workable.  In that case making 
this feature read-only makes the most sense to me.

--

___
Python tracker 
<https://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39673] Map errno==ETIME to TimeoutError

2020-05-25 Thread Stephen J. Turnbull


Stephen J. Turnbull  added the comment:

First, let me say I like Giampaolo's TimeoutExpired *much* better as the name 
for this kind of exception!  But that ship has sailed.

I don't understand Giampaolo's comment.  If I understand the claim correctly, 
the problem is that people who should be catching some application-specific 
exception may be misled into catching TimeoutError instead, or into trying to 
get application-specific attributes from TimeoutError.  But that ship sailed 
with the creation of TimeoutError.  (We have a whole fleet sailing with this 
exception.)  Unless Giampaolo is proposing to deprecate TimeoutError?  I'm 
sympathetic ;-), but deprecation is a PITA and takes forever.

If we're not going to deprecate, it seems to me that it's much more 
developer-friendly to catch ETIME with TimeoutError, as that seems very likely 
to be the expected behavior.  It's true that even if Giampaolo changes 
TimeoutExpired to subclass TimeoutError, generic TimeoutError won't have 
.seconds.  But if you catch a TimeoutExpired with TimeoutError, that instance 
*will* have .seconds, and if you try to get .seconds on generic TimeoutError, 
you'll get a different uncaught exception (AttributeError vs. TimeoutError), 
but that TimeoutError wouldn't have been handled by catching TimeoutExpired.

I agree with Eric that people who were distinguishing OSError with .errno=ETIME 
from TimeoutError might be at risk, but I wouldn't do that: if I were going to 
be distinguishing particular OSErrors on the basis of errno (other than in 
"Unexpected OSError (errno = %d)" reporting style), I'd just catch OSError and 
do that.  On the other hand, I might expect TimeoutError to catch ETIME.  And 
Giampaolo says he's never seen either.  I suppose the author of psutil would be 
as likely as anyone to have seen it!

On net (unless we go the deprecation route) it seems that the convenience and 
"intuition" of adding ETIME to TimeoutError outweighs that risk.

I wish there were somebody who was there at the creation of ETIME!

--
nosy: +sjt

___
Python tracker 
<https://bugs.python.org/issue39673>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29352] provide the authorative source for s[i:j] negative slice indices (<-len(s)) behavior for standard sequences

2017-04-22 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I prefer Josh's wording.  The important point to me is that

>>> [1, 2][2:0] = "AB"
[1, 2, "A", "B"]

not an error or ["B", "A"] == [1, 2][2:0:-1].  I think too much talk about the 
endpoints obscures this important fact.  (I think I'd like it to be an error, 
since the interpretation of s[2:0] = t could reasonably be any of s[0:0] = t, 
s[1:1] = t, or s[2:2] = t, but I haven't thought carefully enough yet, and 
"backward compatibility".)

Note: Josh's wording is already used in 3.7 
(https://docs.python.org/dev/library/stdtypes.html#common-sequence-operations, 
as of the timestamp of this message).  I didn't check if it's been backported.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue29352>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30138] Incorrect documentation of replacement of slice of length 0

2017-04-22 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Sorry, I just realized this note only applies to slices with a stride (k in 
i:j:k).  Closing.

--
stage:  -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue30138>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30138] Incorrect documentation of replacement of slice of length 0

2017-04-22 Thread Stephen J. Turnbull

New submission from Stephen J. Turnbull:

In section 4.6.3. "Mutable Sequence Types" of current documentation, Note 1 to 
the table says "[iterable] t must have the same length as the slice it is 
replacing." This is incorrect in the case of extension: s[len(s):] = t 
according to the rest of the documentation, as well as experiment.

--
assignee: docs@python
components: Documentation
keywords: easy
messages: 292127
nosy: docs@python, sjt
priority: normal
severity: normal
status: open
title: Incorrect documentation of replacement of slice of length 0
type: behavior
versions: Python 3.7

___
Python tracker 
<http://bugs.python.org/issue30138>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-12-27 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Thanks for followup!  I was just about to write you, now that 3.6 is out.  
Season's Greetings!

First, how do you propose to proceed with issue28115 ("use argparse for the 
ZipFile module")?  If you expect to commit that first (I'm in no hurry for this 
patch, BTW, as long as it gets into 3.7 I'm happy), this issue should depend on 
it and use argparse too.

I don't see any good reason for allowing non-UTF-8 encoding to a file open for 
writing, and a good reason (the ZipFile standard) for not allowing it.  
Certainly the CLI should not allow it, any more than it does now.  At least in 
my experiments InfoZip and the default zip utilities on Windows and Mac DTRT 
with UTF-8 zipfiles, so there is no absolute need for writing nonconforming 
zipfiles.  If you want to block on a convert-to-UTF-8 option I can do that (but 
I don't need it myself).  (Note to self: if writing to existing zipfile is 
extension of existing file, need to prevent mixed encodings.  Also warn about 
conversion.)

I thought I checked that comments were decoded.  Maybe that's only on the UTF-8 
path?  Or maybe I needed more coffee.  (Hope so, that would be a messy problem 
if ASCII/Latin1 returns bytes and UTF-8 returns str!)  I'll think about this.  
Yes, it's a backwards-compatibility issue so needs care.  Would be weird if 
names are decoded but other metadata (comments) not, though.  Surely someone 
would complain if they actually used comments?  (I'm thinking maybe a 
compatibility break might be OK?  With deprecation cycle?)  I expect to check 
all execution paths accessing metadata and have a proposed patch by 12/31.

I think I'm still short some tests, will check and write them if needed.

--

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28032] --with-lto builds segfault in many situations

2016-11-20 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

FWIW, XEmacs has used a bit of m4 magic to make --with-* and --enable-* 
equivalent for 15 years, and nobody has ever complained.  The autotools 
convention is a distinction without a difference, and confuses users when a 
program feature depends on an external library (especially where there are 
alternative implementations).

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue28032>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28102] zipfile.py script should print usage to stderr

2016-09-12 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

CA pending (I have received PDF, but no star in tracker yet).

--

___
Python tracker 
<http://bugs.python.org/issue28102>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28103] Style fix in zipfile.rst

2016-09-12 Thread Stephen J. Turnbull

New submission from Stephen J. Turnbull:

Makes style of references to open modes 'r', 'a', ... more consistent.

CA pending (I have received PDF, but no star in tracker yet).

--
assignee: docs@python
components: Documentation
files: zipfile-doc-style
messages: 276058
nosy: docs@python, sjt
priority: normal
severity: normal
status: open
title: Style fix in zipfile.rst
versions: Python 3.7
Added file: http://bugs.python.org/file44595/zipfile-doc-style

___
Python tracker 
<http://bugs.python.org/issue28103>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28102] zipfile.py script should print usage to stderr

2016-09-12 Thread Stephen J. Turnbull

New submission from Stephen J. Turnbull:

Pointed out by Serhiy Storchaka in a different context.

--
components: Library (Lib)
files: zipfile-errmsg
keywords: patch
messages: 276056
nosy: sjt
priority: normal
severity: normal
status: open
title: zipfile.py script should print usage to stderr
versions: Python 3.7
Added file: http://bugs.python.org/file44594/zipfile-errmsg

___
Python tracker 
<http://bugs.python.org/issue28102>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-09-12 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Cleaned up a few loose ends while it's all fresh in mind.  Will ping python-dev 
in 4-6 weeks for review for 3.7.

Thanks to Serhiy for review.  The current version of the patch is much improved 
over the initial submission due to his efforts.

--
Added file: http://bugs.python.org/file44593/encoded-member-names-v3

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-09-12 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

If you have a workaround that's available to nonprogrammers, I'd like to hear 
about it.  I have found none, that's why I went to the trouble to put together 
a patch even though I knew that the odds of actually getting it in to Python 
3.6 was very low -- my patch (or Sergey Dorofeev's, but that needs work to be 
applicable to trunk) does everything I've ever needed, so I suppose it would do 
for all the use cases so far posted (except umedoblock's encoding-guessing 
approach, but that can be handled by many 3rd-party encoding-guessing codecs, I 
think, and IMO should be).

--

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-09-11 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Can't reply on Rietveld?  Lost 2 hours work!

Patch updated (encoded-member-names-v2), most changes accepted.  Not happy 
about name change or default to cp437, I want this API to be hard to use and 
not be part of the normal process (utf-8 or cp437).  Considering errors= 
argument, but that must default to 'strict' -- the problem this patch solves is 
zip utilities extracting to files with unreadable names, surrogateescape is 
more of the same.  Two incomplete tests (assertRaisesRegex and capture main() 
stderr) still in progress, must do dayjob now.

--
Added file: http://bugs.python.org/file44571/encoded-member-names-v2

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-09-11 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Re: wait for 3.7 if reviewers are busy, understood.  N.B. Contributor agreement 
is now on file (I received the PDF from python.org already).

Re: existing patches:
My patch is very similar in the basic approach to Sergey Dorofeev's patch in 
issue10614.  Main differences:
(1) Sergey's patch treats the "encoding" parameter as a first class citizen 
with a default to cp437, whereas mine treats it as a special case defaulting to 
None, with utf-8 and cp437 getting special treatment as the standard encodings. 
 Subtle point, but I like it this way.
(2) My patch includes support for the argument in the __main__ script.
(3) Sergey's patch misses one execution path in the current code so needs 
update before application.

The Japanese patches by umedoblock are very Japanese-centric, and worse, they 
try to guess the encoding by the crude method of seeing what decodes 
successfully.  They are not acceptable IMO.

Aaargh.  Just noticed the Japanese in test_zipfile.py.  Will change it to use 
\u escapes soon.

--

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-09-11 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I should have a contributor agreement form on file.

Ned Deily suggested that I try to get this patch in before the 12 noon deadline 
Sept. 12, so here it is.

I believe the patch is "safe" in the sense that its functionality needs to be 
explicitly enabled, and it should be very difficult to persuade it to 
inadvertantly write to any file.  No existing execution paths should be changed 
at all.

--

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-09-11 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Suggested NEWS/whatsnew entry:

Add a new *memberNameEncoding* argument to the ZipFile constructor, allowing
:mod:`zipfile` to read filenames in non-conforming encodings from the
zipfile as Unicode.  This implementation assumes all member names have the same 
encoding.

Motivation:

There are applications in Japan that create zipfiles with directories 
containing filenames encoded in Shift JIS.  There may be such software in other 
countries as well.  As this is a violation of the Zip format definition, this 
library implements only an option to read such files.

Done:

(1) Add a memberNameEncoding argument to the main() function, which may be set 
from the command line with "--membernameencoding={codec}".  This command line 
option may be used with -e or -l, but not -c or -t.  There is no point to it in 
the latter, since the member names are not printed.
(2) Add a memberNameEncoding argument to the ZipFile constructor.  This is the 
only way to set it, so this is global to the ZipFile.
(3) Add this attribute to repr.
(4) Add a check that the mode is `read` in main() and in the ZipFile 
constructor, and if not invoke USAGE and exit or raise RuntimeError.
(5) When retrieving member names in constructing ZipInfo instances, check if 
memberNameEncoding is set, and if so use it, unless the UTF-8 bit is set. In 
that case, obey the UTF-8 bit, as the specified encoding is surely user error.
(6) Add a CODEC_USAGE message.
(7) Update the docs (docstrings, library reference, NEWS).
(8) Add tests:
(a) List a zipfile's SJIS-encoded directory.
(b) List a UTF-8-encoded directory and an ISO-8859-1-encoded directory as 
Shift-JIS.
(c) Check that USAGE is invoked on attempts to write a zipfile in main().
(d) Check that an appropriate error is raised on attempts to write in other 
functions.
Many other tests are run as well.
ALL TESTS PASS.
(9) Docs build without error.

To do (?):

(10) NEWS/whatsnew
(11) Check relevant code paths are all covered by tests.
(12) Review docs for clarity and organization.

Not done:

I don't think these are appropriate/needed at this time, but listed in case 
somebody thinks otherwise.

(13) Add a subtype of RuntimeError (see 7d)?
(14) Issue warning if both membernameencoding and utf-8 bit are set (see 4)?
(15) Support InfoZip encoding extension mentioned in APPNOTE.TXT - .ZIP File 
Format Specification, v6.3.4.
(16) Support per-member encodings (I think the zipfile standard permits, but 
not sure).

--
keywords: +needs review
status: pending -> open
Added file: http://bugs.python.org/file44564/encoded-member-names

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28080] Allow reading member names with bogus encodings in zipfile

2016-09-11 Thread Stephen J. Turnbull

Changes by Stephen J. Turnbull :


--
components: Library (Lib)
keywords: patch
nosy: sjt
priority: normal
severity: normal
status: open
title: Allow reading member names with bogus encodings in zipfile
type: enhancement
versions: Python 3.6

___
Python tracker 
<http://bugs.python.org/issue28080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I still think the easiest thing to do would be to make all non-ASCII characters 
instances of "invalid_character_token", self-delimiting in the same way that 
operators are.  That would automatically point to exactly the right place in 
the token stream, and requires zero changes to the error handling code.

I don't have time to look at the code, but I suspect that you could handle this 
exactly the same way that ? and $ are handled, and maybe even use the same 
token type.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue27582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27257] get_addresses results in traceback with a valid? header

2016-06-08 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

OK, I can reproduce now.

$ python3.5
Python 3.5.0 (v3.5.0:374f501f4567, Sep 17 2015, 17:04:56) 
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> with open(b'lkml-exception.mail', mode = 'r') as f:
...  msg = email.message_from_file(f, policy=email.policy.SMTP)
... 
>>> msg.get_all('to')
Traceback (most recent call last):

and (except for a slight skew in line-numbering) the rest is the same as the 
tail of the OP.

The crucial part is the policy=email.policy.SMTP argument, and evidently what's 
happening is that the parser assumes that the local-part of the addr-spec is 
non-empty.  RFC5322 does permit a quoted-string to be empty, so this is a bug 
in the email module's parser.  (I don't have a patch,sorry.)

Aside: although strictly speaking it's hold-your-nose-and-avert-your-eyes legal 
according to RFC 5322, RFC 5321 (SMTP) does say:

   While the above definition for Local-part is relatively permissive,
   for maximum interoperability, a host that expects to receive mail
   SHOULD avoid defining mailboxes where the Local-part requires (or
   uses) the Quoted-string form[...].

I don't see a good reason for the usage in the test case, so I'd call this 
nonconformant to RFC 5321.  I think the right way to handle it is to register a 
defect but let the parse succeed.

--

___
Python tracker 
<http://bugs.python.org/issue27257>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27257] get_addresses results in traceback with a valid? header

2016-06-08 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

In Python 3.5, both entering the problematic header by hand with a trivial body 
and using email.message_from_string to parse it, and calling 
email.message_from_file on lkml-exception.mail, produce an 
email.message.Message with no defects and no traceback.

Without access to mail_filter.py, it's not clear what the defect might be.

--
nosy: +sjt
stage:  -> test needed
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue27257>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24682] Add Quick Start: Communications section to devguide

2015-12-05 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I just reviewed again, and I agree it's ready for merge.  I don't see any 
immediate need to add more.

Unfortunately, I'm not a committer.

--

___
Python tracker 
<http://bugs.python.org/issue24682>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24682] Add Quick Start: Communications section to devguide

2015-07-22 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I tend to disagree with Ezio about a FAQ for general questions.  A pointer to 
appropriate alternatives for off-topic posts in the Mailman listinfo 
descriptions of the various list (which can be copied into the devguide, or 
linked from there) will be sufficient for people who actually read such things 
before posting.

OTOH, once there already is a misdirected post, I feel it's appropriate to say 
"This post is off-topic here because this list is for development of Python 
itself, not developing applications with Python.  Posts like yours are ignored  
by almost all participants.  You will get help (possibly better than you could 
get on this list) on pytho...@python.org."  Adding a pointer to a FAQ which 
just repeats the same thing is browbeating IMO.  It's not like we don't have 
several people who have macros to say the above (and more politely than I did) 
who typically respond within hours to off-topic posts.  What more could a FAQ 
say?  Of course this needs to be on-list so that the poster (who usually is a 
little feckless rather than deliberately abusive) doesn't get spammed, and so 
that the multiple volunteers who handle these posts don't duplicate each other.

I personally would like to see a guideline to participants that if they want to 
offer advice on the question itself to people, that they do so off-list.  
Whatever one's opinion on the utility of offering advice in response to an 
off-topic post, such advice is as off-topic as the question that elicits it.

--

___
Python tracker 
<http://bugs.python.org/issue24682>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24682] Add Quick Start: Communications section to devguide

2015-07-22 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

If the mailing list code of conduct is to be fleshed out, Paul Moore's post is 
a good place to start IMO: 
https://mail.python.org/pipermail/python-dev/2015-July/140872.html.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue24682>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18814] Add codecs.convert_surrogateescape to "clean" surrogate escaped strings

2015-05-09 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Please do not add the "rehandle" functions to codecs.  They do not change the 
(duck-typed) representation of data while maintaining the semantics, they 
change the semantics of data while retaining the representation.

I suggest a "validation" submodule of the unicodedata package, or perhaps a new 
"unicodeutils" package, for these functions, as well as those that just detect 
the surrogates, etc.

Because they change the semantics of data they should be documented as 
potentially dangerous because they can't be inverted back to bytes without 
knowledge of the history of transformations they perform (and not even then in 
the case of the "replace" error handler).  This matters in applications where 
the input bytes may have been digitally signed, for example.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue18814>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14983] email.generator should always add newlines after closing boundaries

2013-09-21 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Following OpenPGP convention is clearly optional (or maybe a SHOULD, but the 
word "elect" makes it a pretty weak SHOULD).  RFC 2046 is a MUST, it's not a 
matter of "convention".

The problem is that a parser that works forward in the message will swallow the 
terminating CRLF of the boundary of the signed multipart, and then not find a 
CRLF to introduce the boundary that separates the content from the signature.  
By MIME rules it will treat the signature (including the unrecognized boundary) 
as an epilogue, and ignore it.  This is not at all special to multipart/signed.

--

___
Python tracker 
<http://bugs.python.org/issue14983>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14983] email.generator should always add newlines after closing boundaries

2013-09-21 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Seems this hasn't been resolved.  I have to disagree with David's 
interpretation of RFC 2046.  The definition of a boundary says that it is 
"terminated" with a CRLF.  It also clarifies that the introducing CRLF is 
"conceptually part" of the boundary.  Thus each boundary contains both the 
leading and the trailing CRLF.  There is no exception for the final boundary 
that I can see.

This implies that when two boundaries abut, they need to be separated by *two* 
CRLFs, the trailing CRLF on the ending boundary of the inner multipart and the 
leading CRLF on the next boundary (which might be a separator or the ending 
boundary) of the containing multipart.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue14983>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18891] Master patch for content manager addtion to email package.

2013-09-03 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I'm thinking this may be overengineering, but I may as well post it and find 
out for sure. :-)  Is it worth encapsulating MIME types?  They're "really" 
pairs as far as mail handling applications are concerned, but they have a 
string representation.  So

MacPorts 16:29$ python3.3
Python 3.3.2 (default, May 21 2013, 11:48:51) 
>>> from collections import namedtuple
>>> class MIMEType(namedtuple('MIMETYPE', 'type subtype')):
...  def __str__(self):
...   return "{0}/{1}".format(self.type, self.subtype)
... 
>>> mt = MIMEType('text', 'plain')
>>> str(mt)
'text/plain'
>>> t, s = mt
>>> print('type =', t, 'subtype =', s)
type = text subtype = plain
>>> 

Obviously there needs to be a constructor that handles the 'type/sub' 
represention.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue18891>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689

2013-08-31 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

Yeah, hope is a good thing.  But I've spent the last 20 years debugging an X11 
application based on a Lisp interpreter, I save hope for fireflies, my dog, and 
my daughter these days. :-)

To the OP:

I don't follow Gentoo closely, but I have acquaintances who do.  Between them 
and the occasional foray into the forums, I've gotten the impression that 
providing CFLAGS for optimization is associated with having hard-to-debug 
problems.  They increase performance noticably only in a few applications.  
Python being a  dynamic language, function calls and even variable references 
can be quite inefficient anyway.  So I see no good reason to compile Python 
with aggressive CFLAGS, because it should be used only for moderately 
performance sensitive applications and as "glue code" and to provide UI.  
Instead, use them only for the specific applications that benefit (I suppose 
matplotlib *might* be one).

Second, I tend to agree with the maintainers.  The packages.env / pydebug.conf 
approach is the right thing for this kind of variant build.

Third, you said you hoped to get better backtraces from --with-pydebug.  That's 
a vain hope.  Such options are intended to get better backtraces of C code from 
coredumps where the interpreter breaks down, not of Python code induced by 
Python exceptions caused by problems in user code.  If you have trouble 
interpreting a backtrace, ask on python-l...@python.org or comp.lang.python 
(they mirror each other, you only need one).  If, after understanding the 
backtrace, you have an idea for way to get a better backtrace in this case, you 
can suggest it on python-id...@python.org.

Unfortunately, reporting "this backtrace is unintelligible, please improve it" 
as an RFE on the tracker is likely to get the reply "You're right, but we don't 
know how at this time.  Patches welcome!"  But you could try that if all else 
fails.

--

___
Python tracker 
<http://bugs.python.org/issue18843>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689

2013-08-31 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

OK, I backed off the aggressive CFLAGS/CXXFLAGS to " -ggdb -pipe", and ran 
"emerge =dev-lang/python-2.7.5-r1" *once* each with and without the 
'EXTRA_ECONF="--with-pydebug"' flag.  Compiled with GCC 4.7.3.

No crash, same test results as described previously for GCC 4.6.4.

If you have other suggestions, let me know.

--

___
Python tracker 
<http://bugs.python.org/issue18843>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689

2013-08-31 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I did "emerge =dev-lang/python-2.7.5-r1" *twice* with the environment 
configuration described in msg196520, then *once* with it disabled because one 
of the cases you described was when you tried to revert to a non-debug Python.  
(Besides, I am willing to risk your crash while I'm watching for it, but not a 
time bomb that will go off when I'm on deadline :-).  All builds succeeded and 
all passed the test suite.  Here's how the debug build describes itself:

== CPython 2.7.5 (default, Sep 1 2013, 00:59:02) [GCC 4.6.4] 
==   Linux-3.9.0-x86_64-Dual_Core_AMD_Opteron-tm-_Processor_265-with-gentoo-2.2 
little-endian 

The test suite ran uneventfully (with a few DeprecationWarnings) except for 
this:

6 skips unexpected on linux2: 
test_bsddb test_bsddb3 test_tcl test_tk test_ttk_guionly 
test_ttk_textonly 

but I suppose that is expected on Gentoo.  If any of those modules (bsddb, tcl, 
tk) are built into your Python, a problem in one of those might be the culprit.

Oh, damn.  I just reread the whole thread.  For some reason I thought you were 
using gcc 4.6.4, but now I see you report 4.7.3.  OK build with 4.7.3 and your 
flags (also restore the --with-pydebug config):

# export CFLAGS=" -ggdb -pipe -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 
-msse4 -mavx -maes -mpclmul -mpopcnt"
# export CXXFLAGS="${CFLAGS}"
# export CC=gcc-4.7.3

and we crash (from make output) immediately after linking ./python:

x86_64-pc-linux-gnu-ranlib libpython2.7.a 
gcc-4.7.3 -pthread -Wl,--hash-style=gnu -Wl,-O1 -Wl,--as-needed -L. -Xlinker 
-export-dynamic -o python \ 
Modules/python.o \ 
-L. -lpython2.7 -lpthread -ldl  -lutil   -lm   
LD_LIBRARY_PATH=/var/tmp/portage/dev-lang/python-2.7.5-r1/work/x86_64-pc-linux-gnu:
 ./python -E -S -m sysconfig --generate-posix-vars 
make: *** [pybuilddir.txt] Illegal instruction 

However, I'm pretty sure this is due to my hardware not liking your -m flags, 
not the crash you reported.  I'll try backing those flags out, but if anybody 
has a suggestion for the most aggressive set similar to yours, I'd appreciate 
it.

But first this process is going to go sleep(25200).

--

___
Python tracker 
<http://bugs.python.org/issue18843>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18843] Py_FatalError (msg=0x7f0e3b373232 "bad leading pad byte") at Python-2.7.5/Python/pythonrun.c:1689

2013-08-29 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

I have a gentoo host, but I'd like to know how did the OP get a debug Python in 
the first place?  The ebuild for python 2.7.5-r1 doesn't say anything about 
debug options.  "How" would preferably include information about the C compiler 
used, etc.  If there's no information, I can probably just hack --with-pydebug 
into a local ebuild, but I have something like 7 C compilers installed, I'd 
really like a good guess at the right one.  Also, did he add any optimization 
flags etc when building the debug Python?

(ebuild = the emerge configuration file that describes the build and install 
process for a package.  In "2.7.5-r1" the "r1" indicates the first revised 
ebuild for the same upstream version.)

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue18843>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-08 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

A few small comments and nits.

1. I'm with the author on the question of a sum function in this module.  The 
arguments that builtin sum isn't accurate enough, and neither is math.fsum for 
cases where all data is of infinite precision, are enough for me.

2. A general percentile function should be high on the list of next additions.

A substantive question:

3. Can't add_partial be used in the one-pass algorithms?

Several typos and suggested style tweaks:

4. I would find the summary more readable if grouped by function:
add_partial, sum, StatisticsError; mean, median, mode; pstdev, pvariance, 
stdev, variance.  Maybe I'd like it better if the utilities came last.  IMO 
YMMV, of course.

5. In the big comment in add_partial, "the inner loop" is mentioned.  Indeed 
this is the inner loop in statistics.sum, but there's only one loop in 
add_partial.

6. In the Limitations section of sum's docstring it says "these limitations may 
change".  Is "these limitations may be relaxed" what is meant?  I would hope 
so, but the current phrasing makes me nervous.

7. In sum, there are two comments referring to the construct 
"type(total).__float__(total)", with the first being a forward reference to the 
second.  I would find a single comment above the "isinstance(total, float)" 
test more readable.  Eg,

"""
First, accumulate a non-float sum. Until we find a float, we keep adding.
If we find a float, we exit this loop, convert the partial sum to float, and 
continue with the float code below. Non-floats are converted to float with 
'type(x).__float__(x)'. Don't call float() directly, as that converts strings 
and we don't want that. Also, like all dunder methods, we should call __float__ 
on the class, not the instance.
"""

8. The docstrings for mean and variance say they are unbiased.  This depends on 
the strong assumption of a representative (typically i.i.d.) sample.  I think 
this should be mentioned.

9. Several docstrings say "this function should be used when ...".  In fact the 
choice of which function to use is somewhat delicate.  My personal preference 
would be to use "may" rather than "should."

10. In several of the mode functions, the value is a sorted sequence.  The sort 
key should be specified, because it could be the data value or the score.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue18606>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10686] email.Generator should use unknown-8bit encoded words for headers with 8 bit data

2011-01-06 Thread Stephen J. Turnbull

Stephen J. Turnbull  added the comment:

I agree with you that according to RFC1428, use of unknown-8bit is implicitly 
recommended.  However, note that the RFC itself is not standards-track.  I 
agree with your interpretation that in this context the email module should be 
considered a gateway.  I think it is certainly best to convert to MIME words, 
as you say.

However, if there isn't already, maybe there should be an option to bounce such 
headers back to the user?  That is, in an interactive application this should 
be an error.  Of course we should help the user by allowing and documenting 
(perhaps even defaulting to) whatever we choose for the unknown encoding.

I don't recall ever seeing unknown-8bit in the wild.  What I do see in the wild 
a lot, and specifically in Mailman moderation traffic, is simply "unknown".

A quick google for "unknown-8bit" pulled up some old (2002) discussion of 
unknown-8bit causing problems for some MTAs.  I didn't follow up to see what 
those were.

I don't have time to do it myself today (but would be willing to help out if 
you can wait up to two weeks -- I have travel coming up), but I suggest 
checking for IANA registration of "unknown" and "unknown-8bit".

--

___
Python tracker 
<http://bugs.python.org/issue10686>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8863] Display Python backtrace on SIGSEGV, SIGFPE and fatal error

2010-12-23 Thread Stephen J. Turnbull

Stephen J. Turnbull  added the comment:

Re: msg124528

Yes, XEmacs installs a signal handler on what are normally fatal errors.  (I 
don't know about GNU Emacs but they probably do too.)

The handler has two functions: to display a Lisp backtrace and to output a 
message explaining how to report bugs (even including a brief introduction to 
the "bt" command in gdb. ;-)

I personally have never found the Lisp backtrace useful, except that it can be 
used as a bug signature of sorts ("oh, I think I've seen this one before...").  
However, I suspect this is mostly because in Emacs Lisp very often you don't 
have the name of the function in the backtrace, only a compiled code object.  
So in many cases it's almost no help in localizing the fault.  Victor's patch 
does a lot better on this than XEmacs can, I suspect.

The bug reporting message, OTOH, has been useful to us for the reasons people 
give for wanting the handler installed by default.  We get more and better bug 
reports, often including C backtraces, from people who have never participated 
directly in XEmacs development before.  (It also once served the function of 
inhibiting people from sending us core files. Fortunately, I don't think that 
happens much any more. :-)  Occasionally a user will be all proud of themselves 
because "I never used gdb before," so I'm pretty sure that message is effective.

Quite frequently we see the handler itself crash if there was memory 
corruption, but certainly it gives useful output well over half the time.  So I 
want to back up Victor on those aspects.

Finally, although our experience has be very positive, qnote that XEmacs is not 
an embeddable library, nor is there provision in the mainline versions for 
embedding other interpreters in XEmacs.  So we've never had to worry about the 
issues that come with that.

For more technical details, you could ask Ben Wing  who put a 
lot of effort into the signal handling implementation, or Hrvoje Niksic (sorry, 
no address offhand) who posts on python-dev occasionally.  (I don't know if 
Hrvoje ever worked on the signal handlers, and he hasn't worked on XEmacs for 
years, but he was more familiar with internals than me then, and might very 
well still remember more than I ever knew. :-)  I don't think either will 
disagree with my general statements above, though.

--

___
Python tracker 
<http://bugs.python.org/issue8863>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-08 Thread Stephen J. Turnbull

Changes by Stephen J. Turnbull :


--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue9873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6315] locale._build_localename(locale.getdefaultlocale()) returns 'C.mac-roman'

2009-06-20 Thread Stephen J. Turnbull

New submission from Stephen J. Turnbull :

Which causes the locale machinery to spit exceptions, and the program to 
die, usually (eg, hg).

This manifests naturally on an Intel Mac, Mac OS X 10.5.7, but the 
problem behavior is in _build_localename.  When called as

_build_localename((None,'any_string'))

it returns 'C.any_string'.  I don't know of any system that supports 
anything but the POSIX portable character set in the C/POSIX locale, so
this is clearly wrong.

I suggest that when the first component of the argument is None, the
second component should be ignored.

Probably my Mac is misconfigured, but I think this is still a bug that 
should be fixed.

Observed in all of 2.5.4, 2.6.2, and 3.0.1 (vanilla MacPorts builds).

References: It's possible this is related to issue1699853, issue1176504, 
issue504219, but I don't think fixing this will help with those issues.  
It is not related to issue3067.

--
components: Library (Lib)
messages: 89537
nosy: sjt
severity: normal
status: open
title: locale._build_localename(locale.getdefaultlocale()) returns 'C.mac-roman'
type: behavior
versions: Python 2.5, Python 2.6, Python 3.0

___
Python tracker 
<http://bugs.python.org/issue6315>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5252] 2to3 should detect and delete import of removed statvfs module

2009-02-14 Thread Stephen J. Turnbull

Stephen J. Turnbull  added the comment:

Benjamin Peterson writes:

 > Hmm. 2to3 doesn't currently mess with the stat module and os.stat the
 > more common function. Also the new interface (attributes on the objects
 > returned) has been around since 2.2.

So what?  You *can't* import a nonexistent module, so the import
statement should be removed to save the programmer the trouble.

___
Python tracker 
<http://bugs.python.org/issue5252>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2899] Fixers find, rfind, etc in 'string' module

2009-02-13 Thread Stephen J. Turnbull

Stephen J. Turnbull  added the comment:

Maybe 2to3 could get a --pedantic or even an --annoying option?  I agree 
that it should be noisy about removed features even if actually fixing 
this kind of thing would be hard to do reliably.

--
nosy: +sjt

___
Python tracker 
<http://bugs.python.org/issue2899>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5252] 2to3 should detect and delete import of removed statvfs module

2009-02-13 Thread Stephen J. Turnbull

New submission from Stephen J. Turnbull :

It should also try to convert stuff like

from statvfs import F_BAVAIL, F_FRSIZE
status = os.statvfs(directory)
available = status[F_BAVAIL]/((1024*1024)/status[F_FRSIZE]

--
components: 2to3 (2.x to 3.0 conversion tool)
messages: 81959
nosy: sjt
severity: normal
status: open
title: 2to3 should detect and delete import of removed statvfs module
type: feature request
versions: Python 3.0

___
Python tracker 
<http://bugs.python.org/issue5252>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com