Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Stephen J. Turnbull
l...@rmi.net writes:

 > To put that more strongly, the Python user base is much larger than 
 > this list's readership.

Agreed.  Nevertheless, this is the channel (not "channel") that the
developers listen on, and substantial effort is made to let Python
users know that.  I think they do know it, too.

 > If I'm using 3.1 email, so are many others.

That's not obvious.  3.1 email is unusable for several applications.
In fact, for human factors reasons (humans are very likely to
communicate with other humans who use the same encodings, and to
accept occasional glitches they must deal with manually), MUAs are
likely to port relatively easily as "good enough" software.  But I
doubt very much that folks writing MTAs or spam filters that must run
unattended, often in long-lived, very active processes, are producing
production versions using Python 3 email yet.

 > People will accept the 3.X world you make up to a point, but it's 
 > impossible to code to a moving target, much less base a product on 
 > it.

"Impossible is nothing."  It's a decision that each individual
developer makes for herself.  I haven't heard Mailman devs complain
about the impossibility of dealing with the proposed changes, for
example.  Quite the reverse, in fact.

 > At some point, they'll simply stop trying to keep up; in fact, 
 > some already have.

Predictable and predicted.  Where's the balance?  I don't know, but
"channeling" the users is not a lot of help.  There are three worthy
goals here:

1. Taking advantage of improvements in to-be-released Pythons.
2. Not changing one's own working code.
3. Not participating in python-dev/email-sig.

Take any two; one can't have all three.

More specifically, it's interesting that most of the users you talk to
care enough to actually say they don't want more incompatible changes.
But what are we supposed to take from that?  Some fixes have to be
incompatible; do the users want the fix or the compatibility?  You
waffle (as a good representative often must):

 > Fixes are a Good Thing, of course, and this particular change's scope
 > remains to be seen; but to channel most of the users I meet out there
 > in the real world today: Enough with the 3.X changes already, eh?

But that's also a decision each developer *can* make for himself.
Python does not withdraw products, or even withdraw support, just
because the core developers release something they consider better.

If having 1 *and* 2 is so important to particular users, but they come
into conflict because of proposed changes in Python, then they're
going to have to give up 3, come here, and articulate their needs.  As
you are doing -- but to have real influence, you're going to have to
do the review of David's patch that he requests.

I really don't see how the process can work any other way.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Stephen J. Turnbull
R. David Murray writes:

 > > The MIME-charset = UNKNOWN dodge might be a better way of handling
 > > this.
 > 
 > That is a very interesting idea.  It is the *right* thing to do, since it
 > would mean that a message parsed as bytes could be generated via Generator
 > and passed to, say, smtplib without losing any information.  However,
 > It's not exactly trivial to implement, since issues of runs of characters
 > and line re-wrapping need need to be dealt with.  Perhaps Header can be
 > made to handle bytes in order to do this; I'll have to look in to
 > it.

Ouch.  RFC 822 line wrapping is a bytes->bytes transformation, and the
client shouldn't see it at all unless it inspects the wire format.
MIME-encoding is a text->bytes transformation, again an internal
matter.  The constraints on the wire format means that the MIME-
encoder needs to careful about encoded-word length.  ISTM that all you
need to know, assuming that this is a method on a Header, and it's
normally invoked just before conversion to bytes, is the codec and the
CTE, and both can be optional (default to 'utf-8' and a value
depending on the proportion of encodable characters).

You take the header, encode according to the codec, then start
MIME-encoding according to the CTE.  The maximum size of encoded words
is chosen to fit on a line within 78 bytes.  The number of bytes
encoded in each word depends only on the size of metadata associated
with the word.  (Sure you could make it prettier for those reading it
with an "MUA" like less, but I don't think that's really worth
anybody's time.)

*If* you have an 8-bit value of unknown encoding on input, this will
appear in the Header's value as a surrogate.  Hm, OK, I see the
problem ... as usual, it's that the only efficient thing to do is
encode using surrogate-escape which loses the information that these
are invalid bytes.  Would it really be that bad to add an O(length)
component where you examine the string for surrogates (and too-long
words, for that matter), and chop off those pieces for MIME encoding?

 > >  > Presumably you are suggesting that email5 be smart enough to turn my
 > >  > example into properly UTF-8/CTE encoded text.
 > > 
 > > No, in general that's undecidable without asking the originator,
 > > although humans can often make a good guess.
 > 
 > I was talking about unicode input, though, where you do know (modulo
 > the language differences that unicode hasn't yet sorted out).

I don't understand why this is difficult.  As far as what Unicode has
and hasn't sorted out, that's not your job AFAICS.  If clients want a
specific codec or other language-based style, they'd better specify it
themselves.  Else, you just stuff the Unicode into a UTF-8-encoded
bytes, and go from there.  This is *why* Unicode was designed, so that
software could do something standard and sane with text which needs to
be readable but not exquisitely crafted literary works.  No?  If you
want beauty, then use a markup language.

 > Right, but I was talking about my python3 example, where I was using
 > the email5 parser to (unsuccessfully) parse unicode.  *That's* the thing
 > email5 can't really handle, but email6 will be able to.

For email5 it would be an extension, yes, but I don't see why it would
be hard to handle Unicode input, assuming it's *really* Unicode,
unless you want to cater to "legacy" systems that might not understand
Unicode (or at least would prefer an alternative encoding).  Since
it's an extension, I don't think that's your problem, and the people
who would really like this extension (eg, the Japanese) are used to
dealing with mojibake issues.  (Of course, as an extension, you don't
need to do it at all.  This is just speculation.)

The problem would be with careless clients of email5 that find a way
to hand it bogus Unicode (eg, by inappropriately using the latin-1
codec to get a binary represention of their bytes in Unicode), but I'm
not sure how big a problem that would be.

 > Thank you very much for this piece of perspective.  I hadn't thought
 > about it that clearly before, but what you say makes perfect sense to me,
 > and is in fact the implicit perspective I've been working from when
 > working on the email6 stuff.

You're welcome, of course, and it makes me feel much better about
email6.  (Not that I had any real worries, but here we are about
halfway up a 100m cliff, and the trail just widened from 20cm to
2m. :-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python.org going down?

2010-10-07 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/07/2010 07:42 PM, "Martin v. Löwis" wrote:
>> FWIW, PyPI was inaccessible for some longish period of time this morning.
> 
> That I can confirm (assuming you are talking about the UTC night).
> However, it stopped around 5:00 UTC, so it's clearly unrelated to
> anything that happened reportedly 2 hours ago.

If you have evidence in the logs for one and not the other, then they
are likely unrelated.  I was speculating about possible issues with,
say, the network fabric at the ISP, or something, which could cause
apparent outages from the outside viewer's perspective, without
necessarily leaving any trace on the machines hosting the sites, except
a drop in traffic in the logs.


Tres
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyuaZwACgkQ+gerLs4ltQ5GGwCeIVOv1up3AW0wbBtrrnUE70H1
tLgAniCWWmsXG3PRXzxXEVfunMLNSiku
=+NY4
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python.org going down?

2010-10-07 Thread geremy condra
On Thu, Oct 7, 2010 at 4:17 PM, "Martin v. Löwis"  wrote:
>>>
>>> Nothing on python.org suggests that this has actually happened. Could
>>> it be that the issue is on your end?
>>
>> It's possible, but I was first notified that it was happening by
>> someone literally 3000 miles away. It seems unlikely that we would
>> both have had the same problem at the same time under those
>> conditions, doesn't it?
>
> True. However, I really cannot see anything on the machines that
> indicates some outage. I'm still unsure what "it" is that was happening,
> so it's also difficult to analyse this further.

 chalk it up to a mystery of the internet, I guess. It still
seems strange to me that two people would get the same behavior so far
away from each other and not have it be on that end though.

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python.org going down?

2010-10-07 Thread Martin v. Löwis
> FWIW, PyPI was inaccessible for some longish period of time this morning.

That I can confirm (assuming you are talking about the UTC night).
However, it stopped around 5:00 UTC, so it's clearly unrelated to
anything that happened reportedly 2 hours ago.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python.org going down?

2010-10-07 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/07/2010 07:17 PM, "Martin v. Löwis" wrote:
>>>
>>> Nothing on python.org suggests that this has actually happened. Could
>>> it be that the issue is on your end?
>>
>> It's possible, but I was first notified that it was happening by
>> someone literally 3000 miles away. It seems unlikely that we would
>> both have had the same problem at the same time under those
>> conditions, doesn't it?
> 
> True. However, I really cannot see anything on the machines that
> indicates some outage. I'm still unsure what "it" is that was happening,
> so it's also difficult to analyse this further.

FWIW, PyPI was inaccessible for some longish period of time this morning.



Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyuWL8ACgkQ+gerLs4ltQ67gACeML5yTGYeeUPNzkwAQjGY+DFn
kOkAnisWvAi0AAYOwPqvwqIcG0h7emPj
=kdeJ
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python.org going down?

2010-10-07 Thread Martin v. Löwis
>>
>> Nothing on python.org suggests that this has actually happened. Could
>> it be that the issue is on your end?
> 
> It's possible, but I was first notified that it was happening by
> someone literally 3000 miles away. It seems unlikely that we would
> both have had the same problem at the same time under those
> conditions, doesn't it?

True. However, I really cannot see anything on the machines that
indicates some outage. I'm still unsure what "it" is that was happening,
so it's also difficult to analyse this further.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python-2.6.6 coredump running newspipe

2010-10-07 Thread Brett Cannon
On Thu, Oct 7, 2010 at 15:53, Thomas Klausner  wrote:
> On Thu, Oct 07, 2010 at 11:15:27AM -0700, Brett Cannon wrote:
>> It's best to report issues at bugs.python.org.
>
> Are different people reading it there?

Yes, but we also just don't accept bug reports here as they just get
lost in mailing list traffic.

>
> Following your suggestion, I've created
> http://bugs.python.org/issue10047

Thanks.

-Brett

>
> Please let me know what kind of further details you need.
> I still have the python sources, executable with debugging symbols and
> core dump lying around, so right now I can give feedback most quickly.
>
> Thanks,
>  Thomas
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python.org going down?

2010-10-07 Thread geremy condra
On Thu, Oct 7, 2010 at 3:52 PM, "Martin v. Löwis"  wrote:
> Am 08.10.2010 00:00, schrieb geremy condra:
>> Seems like python.org has gone down and come back up a couple of times
>> in the last few minutes, is this intentional?
>
> Nothing on python.org suggests that this has actually happened. Could
> it be that the issue is on your end?

It's possible, but I was first notified that it was happening by
someone literally 3000 miles away. It seems unlikely that we would
both have had the same problem at the same time under those
conditions, doesn't it?

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python-2.6.6 coredump running newspipe

2010-10-07 Thread Thomas Klausner
On Thu, Oct 07, 2010 at 11:15:27AM -0700, Brett Cannon wrote:
> It's best to report issues at bugs.python.org.

Are different people reading it there?

Following your suggestion, I've created
http://bugs.python.org/issue10047

Please let me know what kind of further details you need.
I still have the python sources, executable with debugging symbols and
core dump lying around, so right now I can give feedback most quickly.

Thanks,
 Thomas
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python.org going down?

2010-10-07 Thread Martin v. Löwis
Am 08.10.2010 00:00, schrieb geremy condra:
> Seems like python.org has gone down and come back up a couple of times
> in the last few minutes, is this intentional?

Nothing on python.org suggests that this has actually happened. Could
it be that the issue is on your end?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] python.org going down?

2010-10-07 Thread geremy condra
Seems like python.org has gone down and come back up a couple of times
in the last few minutes, is this intentional?

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rietveld integration into Roundup

2010-10-07 Thread Antoine Pitrou
Le jeudi 07 octobre 2010 à 23:17 +0200, "Martin v. Löwis" a écrit :
> > As I said, most patches are supposed to be produced against py3k HEAD,
> > so you could try just that as the primary heuristic.
> 
> I think this is impractical. There are tons of patches (the majority)
> which are in the tracker and *not* against py3k head. So this heuristics
> will only cover a small minority.

I don't understand what the problem is. As long as the patch *applies*
cleanly on py3k HEAD, it's ok. If it doesn't, the patch will have to be
re-generated anyway.

> > I think you may trying too hard to find smart ways of inferring the
> > correct svn rev and branch, while developing against latest py3k is,
> > most of the time, the required standard. Outdated patches are not
> > really helpful anyway
> 
> Hmm. So how many versions should I go back in py3k until giving up?

Zero.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rietveld integration into Roundup

2010-10-07 Thread Martin v. Löwis
> As I said, most patches are supposed to be produced against py3k HEAD,
> so you could try just that as the primary heuristic.

I think this is impractical. There are tons of patches (the majority)
which are in the tracker and *not* against py3k head. So this heuristics
will only cover a small minority.

> I think you may trying too hard to find smart ways of inferring the
> correct svn rev and branch, while developing against latest py3k is,
> most of the time, the required standard. Outdated patches are not
> really helpful anyway

Hmm. So how many versions should I go back in py3k until giving up?

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rietveld integration into Roundup

2010-10-07 Thread Antoine Pitrou
On Thu, 07 Oct 2010 23:04:54 +0200
"Martin v. Löwis"  wrote:
> 
> However, if I get something like
> 
> diff -r e981b6cc56b0 Include/longintrepr.h
> --- a/Include/longintrepr.h   Thu Oct 07 03:12:19 2010 +0200
> +++ b/Include/longintrepr.h   Thu Oct 07 13:53:41 2010 +0200
> 
> I have no clue where I should look for the base revision
> that the patch was created against.

As I said, most patches are supposed to be produced against py3k HEAD,
so you could try just that as the primary heuristic.

I think you may trying too hard to find smart ways of inferring the
correct svn rev and branch, while developing against latest py3k is,
most of the time, the required standard. Outdated patches are not
really helpful anyway.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rietveld integration into Roundup

2010-10-07 Thread Martin v. Löwis
Am 04.10.2010 03:56, schrieb Daniel Stutzbach:
> On Sat, Oct 2, 2010 at 3:55 PM, "Martin v. Löwis"  > wrote:
> 
> I'll have to come up with a better way to determine the branch
> which a patch was created on.
> 
> 
> That would also be helpful for those of us using DVCS software to talk
> to the svn server. :-)

Not sure in what way that would be helpful: I know *have* a better way
to determine the branch a patch was created on, but I completely fail
to see how it would help the DVCS software users. I take the svn
revision number from the patch, and then search back in history until
I get a revision where all chunks patch cleanly without any changes
to the line numbers; this has already helped a lot. I also have a
database listing all file names, so I can deal with patches that were
created for a subdirectory.

However, if I get something like

diff -r e981b6cc56b0 Include/longintrepr.h
--- a/Include/longintrepr.h Thu Oct 07 03:12:19 2010 +0200
+++ b/Include/longintrepr.h Thu Oct 07 13:53:41 2010 +0200

I have no clue where I should look for the base revision
that the patch was created against. I could guess that the base
revision was probably created on Oct 07 3:12:19, which would make
that r85299. Not sure how reliable this is, though - will the
DVCS software always use the same time stamps in the diff for the
base version as are recorded in the original repository?

Also, there are no directories called "a" and "b" in the repository.
Will DVCS software always use these pseudo directories?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesystem?encodings are different

2010-10-07 Thread Oleg Broytman
On Thu, Oct 07, 2010 at 09:12:13PM +0200, Victor Stinner wrote:
> Le jeudi 07 octobre 2010 18:44:19, Oleg Broytman a ?crit :
> >My filesystems are always koi8-r, but sometimes I work with programs in
> > utf-8 locale. Just an example...
> 
> Are programs able to display correctly non-ascii filenames if your locale 
> encoding is different than your filesystem encoding?

   Most of them don't because - you are right - most programs assume fs
encoding to be the same as stdio locale. But some programs are more clever;
for example, one can define G_FILENAME_ENCODING env var to guide GTK2/GLib
programs; it can be a fixed encoding or a special value "@locale". On the
other side there are programs that ignore locale completely and read/write
filenames using their own fixed encoding; for example, Transmission
bittorrent client read/write files in the encoding defined in the .torrent
metafile.

Oleg.
-- 
 Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesys tem encodings are different

2010-10-07 Thread Victor Stinner
Le jeudi 07 octobre 2010 18:35:09, M.-A. Lemburg a écrit :
> Victor Stinner wrote:
> > Hi,
> > 
> > A PYTHONFSENCODING environment variable was added to Python 3.2: issue
> > #8622. This variable introduces an inconstency because the filesystem
> > and the locale encodings can now be different.
> > 
> > There are (at least) four issues related to this problem. We have 2
> > choices to
> > 
> > fix these issues:
> >  (a) use the same encoding to encode and decode values (it can be
> >  different
> > 
> > for each issue)
> > 
> >  (b) remove PYTHONFSENCODING variable and raise an error if locale and
> > 
> > filesystem encodings are different (ensure that both encodings are the
> > same)
> > 
> > Even if choice (a) is not easy to implement, it is feasible and I already
> > wrote some patches.
> > 
> > I don't understand how Python interact with other programs who ignore the
> > PYTHONFSENCODING environment variable. It's like Python uses its own
> > "locale".
> > 
> > Choice (b) looks easy to implement, but... there is the problem of Mac OS
> > X. Mac OS X uses utf-8 encoding for the filesystem (and not the locale
> > encoding), whereas it looks like the locale encoding is used for the
> > command line arguments. See issue #4388 for more information.
> > 
> > There is also maybe an useful usecase of the PYTHONFSENCODING, but I
> > don't remember which one :-)
> 
> You have to differentiate between the meaning of a file system
> encoding and the locale:
> 
> A file system encoding defines how the applications interact
> with the file system.
> 
> A locale defines how the user expects to interact with the
> application.

What is the encoding of the command line arguments? Locale or filesystem 
encoding? Is it different if an argument is a filename or a path?

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesys tem encodings are different

2010-10-07 Thread Victor Stinner
Le jeudi 07 octobre 2010 18:44:19, Oleg Broytman a écrit :
> On Thu, Oct 07, 2010 at 06:35:09PM +0200, M.-A. Lemburg wrote:
> > It is well possible that the two are different. Mac OS X is
> > just one example. Another common example is having a Unix
> > account using the C locale (=ASCII) while working on a UTF-8
> > file system.
> 
>My filesystems are always koi8-r, but sometimes I work with programs in
> utf-8 locale. Just an example...

Are programs able to display correctly non-ascii filenames if your locale 
encoding is different than your filesystem encoding?

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Barry Warsaw
On Oct 07, 2010, at 04:40 AM, Stephen J. Turnbull wrote:

> > And the email API currently promises not to raise during parsing,
> > which is a contract my patch does not change.
>
>Which is a contract that has historically been broken frequently.
>Unhandled UnicodeErrors have been one of the most common causes of
>queue stoppage in Mailman (exceeded only by configuration errors
>AFAICS).  I haven't seen any reports for a while, but with the email
>package being reengineered from the ground up, the possibility of
>regression can't be ignored.

I'm fairly certain that most of the modern causes of this are post-parse
modifications of the message.  IOW, in Mailman's architecture, we try to parse
the raw data into a Message object tree very early in the pipeline, and then a
pickled version of that gets passed between the queue runners.  If the initial
parse fails, there's almost literally nothing Mailman can do with the original
data other than delete it.

Where we've gotten into trouble before has been things like adding the Subject
prefixes and such.  That seems like application logic that the email package
can't really get involved with, and indeed Mailman has built up a raft of
defense for failures of this kind.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python-2.6.6 coredump running newspipe

2010-10-07 Thread Brett Cannon
It's best to report issues at bugs.python.org.

On Thu, Oct 7, 2010 at 05:17, Thomas Klausner  wrote:
> Hi!
>
> I'm running newspipe-1.1.9, an RSS reader
> (http://newspipe.sourceforge.net/), on NetBSD-5.99.11/amd64 using
> Python-2.6.6.
>
> Sometimes, it core dumps with particular feeds in the configuration (I
> guess depending on the feed, because when I comment out the offending
> feed in the opml file, it runs through to completion).
>
> The backtrace looks like this:
> Core was generated by `python'.
> Program terminated with signal 10, Bus error.
> #0  0x7f7ffdc35a21 in PyOS_snprintf (str=0x7f7ff5dfe3d8 "@", size=120, 
> format=0x1 ) at Python/mysnprintf.c:43
> 43      {
> (gdb) bt
> #0  0x7f7ffdc35a21 in PyOS_snprintf (str=0x7f7ff5dfe3d8 "@", size=120, 
> format=0x1 ) at Python/mysnprintf.c:43
> #1  0x7f7ffdc471a6 in PyOS_ascii_formatd (buffer=0x7f7ff5dfe3d8 "@", 
> buf_size=120, format=0x7f7ff5dfe388 "%.2f", d=0.15256118774414062) at 
> Python/pystrtod.c:455
> #2  0x7f7ffdbaa7fa in formatfloat (buf=0x7f7ff5dfe3d8 "@", buflen=120, 
> flags=16, prec=2, type=102, v=0x7f7ffcc6d510) at Objects/stringobject.c:4378
> #3  0x7f7ffdbabd32 in PyString_Format (format=0x7f7ffc8144e0, 
> args=0x7f7ffcc6d510) at Objects/stringobject.c:4943
> #4  0x7f7ffdbaa3b0 in string_mod (v=0x7f7ffc8144e0, w=0x7f7ffcc6d510) at 
> Objects/stringobject.c:4116
> #5  0x7f7ffdb459db in binary_op1 (v=0x7f7ffc8144e0, w=0x7f7ffcc6d510, 
> op_slot=32) at Objects/abstract.c:917
> #6  0x7f7ffdb45c81 in binary_op (v=0x7f7ffc8144e0, w=0x7f7ffcc6d510, 
> op_slot=32, op_name=0x7f7ffdc6c089 "%") at Objects/abstract.c:969
> #7  0x7f7ffdb467ad in PyNumber_Remainder (v=0x7f7ffc8144e0, 
> w=0x7f7ffcc6d510) at Objects/abstract.c:1221
> #8  0x7f7ffdc08a03 in PyEval_EvalFrameEx (f=0x7f7fefa1dab0, throwflag=0) 
> at Python/ceval.c:1180
> #9  0x7f7ffdc1175f in fast_function (func=0x7f7ff8a9bed8, 
> pp_stack=0x7f7ff5dfeae8, n=1, na=1, nk=0) at Python/ceval.c:3836
> #10 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dfeae8, oparg=1) at 
> Python/ceval.c:3771
> #11 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7fee920420, throwflag=0) 
> at Python/ceval.c:2412
> #12 0x7f7ffdc0f715 in PyEval_EvalCodeEx (co=0x7f7ffcc247b0, 
> globals=0x7f7ffd1c5880, locals=0x0, args=0x7f7ff5b0aac8, argcount=8, 
> kws=0x7f7ff5b0ab08, kwcount=0, defs=0x7f7ff8d3c4e8,
>    defcount=5, closure=0x0) at Python/ceval.c:3000
> #13 0x7f7ffdc1184a in fast_function (func=0x7f7ff8a9cc80, 
> pp_stack=0x7f7ff5dfeff8, n=8, na=8, nk=0) at Python/ceval.c:3846
> #14 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dfeff8, oparg=7) at 
> Python/ceval.c:3771
> #15 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7ff5b0a820, throwflag=0) 
> at Python/ceval.c:2412
> #16 0x7f7ffdc1175f in fast_function (func=0x7f7ff8a9e140, 
> pp_stack=0x7f7ff5dff358, n=1, na=1, nk=0) at Python/ceval.c:3836
> #17 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dff358, oparg=0) at 
> Python/ceval.c:3771
> #18 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7ff5b0a420, throwflag=0) 
> at Python/ceval.c:2412
> #19 0x7f7ffdc1175f in fast_function (func=0x7f7ffca1db90, 
> pp_stack=0x7f7ff5dff6b8, n=1, na=1, nk=0) at Python/ceval.c:3836
> #20 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dff6b8, oparg=0) at 
> Python/ceval.c:3771
> #21 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7ff5b03190, throwflag=0) 
> at Python/ceval.c:2412
> #22 0x7f7ffdc0f715 in PyEval_EvalCodeEx (co=0x7f7ffca0d4e0, 
> globals=0x7f7ffca473a0, locals=0x0, args=0x7f7ff04d3e68, argcount=1, kws=0x0, 
> kwcount=0, defs=0x0, defcount=0, closure=0x0)
>    at Python/ceval.c:3000
> #23 0x7f7ffdb7a612 in function_call (func=0x7f7ffca1daa0, 
> arg=0x7f7ff04d3e50, kw=0x0) at Objects/funcobject.c:524
> #24 0x7f7ffdb495e8 in PyObject_Call (func=0x7f7ffca1daa0, 
> arg=0x7f7ff04d3e50, kw=0x0) at Objects/abstract.c:2492
> #25 0x7f7ffdb5eca0 in instancemethod_call (func=0x7f7ffca1daa0, 
> arg=0x7f7ff04d3e50, kw=0x0) at Objects/classobject.c:2579
> #26 0x7f7ffdb495e8 in PyObject_Call (func=0x7f7ff8ac2a00, 
> arg=0x7f7ffd112050, kw=0x0) at Objects/abstract.c:2492
> #27 0x7f7ffdc10cd3 in PyEval_CallObjectWithKeywords (func=0x7f7ff8ac2a00, 
> arg=0x7f7ffd112050, kw=0x0) at Python/ceval.c:3619
> #28 0x7f7ffdc4e69f in t_bootstrap (boot_raw=0x7f7ffd1b4590) at 
> ./Modules/threadmodule.c:428
> #29 0x7f7ffd90ba32 in pthread_setcancelstate () from 
> /usr/lib/libpthread.so.1
> #30 0x7f7ffd26e9b0 in ___lwp_park50 () from /usr/lib/libc.so.12
> #31 0x in ?? ()
> (gdb) fr 1
> #1  0x7f7ffdc471a6 in PyOS_ascii_formatd (buffer=0x7f7ff5dfe3d8 "@", 
> buf_size=120, format=0x7f7ff5dfe388 "%.2f", d=0.15256118774414062) at 
> Python/pystrtod.c:455
> 455         PyOS_snprintf(buffer, buf_size, format, d);
> (gdb) l
> 450             format = tmp_format;
> 451         }
> 452
> 453
> 454         /* Have PyOS_snprintf do the hard work */
> 455

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 16:03:18 -, l...@rmi.net wrote:
> I'm forwarding a link to the code of these clients to David by 
> private email in case they might be useful as a test case (O'Reilly
> has already posted them ahead of the book, but they may be a bit too
> heavy for use in formal testing).

Thanks very much.  I will take a look, and expect they will
be helpful.

> The email package is obviously less than ideal today, and there are
> many other clients for it besides my own, of course.  But making it 
> backward incompatible at this point is likely to be seen as a big 
> negative to newcomers evaluating 3.X viability.  And as I tried to 
> make clear in June, this list should carefully weigh the PR cost of 
> pulling the rug out from under those brave souls who have already 
> taken the time to accommodate the 3.X world you've mandated.

Well, as I have said before the plan is to provide backward compatibility
in email6, so that you only need to change your code if you want to
take advantage of improved or new functionality.  If this turns out not
to be possible for some reason, then we aren't going to suddenly stop
supporting email5.  That's not the Python Way :)  (Example: we added
ArgParse post-3.0, and lots of people wanted to deprecate OptParse,
but we aren't planning on removing OptParse.)

Do you see any issues with the patch I'm proposing?  My goal is to make
things work that didn't work before, but nothing that worked before
should stop working, if I do my job right.

The one *potentially* backward-incompatible change that I'm consciously
considering (that is, any other backward incompatibilities will be bugs)
is having DecodedGenerator fully decode headers and emit full unicode,
rather than the ASCII-only unicode that Generator emits.  Can you think
of any problem that that would cause?  A quick grep indicates your own
code does not use that generator (possibly because currently it does not
do that decoding).  I could, of course, only enable header decoding if
a flag is passed requesting it, and as I write this I realize that that
is indeed what I should do.  Even though I haven't been able to think of a
case where DecodedGenerator producing non-ASCII unicode would be an issue,
that doesn't mean there isn't one :)

> To put that more strongly, the Python user base is much larger than 
> this list's readership.  If I'm using 3.1 email, so are many others.
> People will accept the 3.X world you make up to a point, but it's 
> impossible to code to a moving target, much less base a product on 
> it.  At some point, they'll simply stop trying to keep up; in fact, 
> some already have.
>
> Fixes are a Good Thing, of course, and this particular change's scope
> remains to be seen; but to channel most of the users I meet out there
> in the real world today: Enough with the 3.X changes already, eh?

Now that Python3 is out, the backward compatibility policy for it is
the same as it always was for Python2.  Only the transition from 2
to 3 broke backward compatibility in a significant way.  From here
on, we are as conservative as we always have been at making backward
incompatible changes (that is, we don't do it intentionally without
a good reason and a deprecation cycle, and if we do it unintentionally
it is a regression and treated as such).

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesystem encodings are different

2010-10-07 Thread Oleg Broytman
On Thu, Oct 07, 2010 at 06:35:09PM +0200, M.-A. Lemburg wrote:
> It is well possible that the two are different. Mac OS X is
> just one example. Another common example is having a Unix
> account using the C locale (=ASCII) while working on a UTF-8
> file system.

   My filesystems are always koi8-r, but sometimes I work with programs in
utf-8 locale. Just an example...

Oleg.
-- 
 Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesystem encodings are different

2010-10-07 Thread M.-A. Lemburg
Victor Stinner wrote:
> Hi,
> 
> A PYTHONFSENCODING environment variable was added to Python 3.2: issue #8622. 
> This variable introduces an inconstency because the filesystem and the locale 
> encodings can now be different.
> 
> There are (at least) four issues related to this problem. We have 2 choices 
> to 
> fix these issues:
> 
>  (a) use the same encoding to encode and decode values (it can be different 
> for each issue)
> 
>  (b) remove PYTHONFSENCODING variable and raise an error if locale and 
> filesystem encodings are different (ensure that both encodings are the same)
> 
> Even if choice (a) is not easy to implement, it is feasible and I already 
> wrote some patches.
> 
> I don't understand how Python interact with other programs who ignore the 
> PYTHONFSENCODING environment variable. It's like Python uses its own "locale".
> 
> Choice (b) looks easy to implement, but... there is the problem of Mac OS X. 
> Mac OS X uses utf-8 encoding for the filesystem (and not the locale 
> encoding), 
> whereas it looks like the locale encoding is used for the command line 
> arguments. See issue #4388 for more information.
> 
> There is also maybe an useful usecase of the PYTHONFSENCODING, but I don't 
> remember which one :-)

You have to differentiate between the meaning of a file system
encoding and the locale:

A file system encoding defines how the applications interact
with the file system.

A locale defines how the user expects to interact with the
application.

It is well possible that the two are different. Mac OS X is
just one example. Another common example is having a Unix
account using the C locale (=ASCII) while working on a UTF-8
file system.

BTW: We added that because just like I/O encoding, you need to be
able to override the setting determined by Python via locale
introspection, which may be wrong. The env var is only meant
as a way to solve encoding problems in special situations where
the local cannot be used to determine the file system or
input/output encoding.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 07 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread lutz
Stephen J. Turnbull wrote (giving me an opening to jump in here):
> R. David Murray writes:
> > In other words, my proposed patch only makes email5 1/8 to 1/4
> > broken, instead of half broken as it is now.  But not un-broken
> > enough for Mailman, it sounds like.
>
> IMO, not in the long run.  But realistically, in the applications I
> know of, most desired traffic is conformant, and since there aren't
> any Python 3 email apps yet, this isn't even a regression. :-/

Well, yes there are, and yes it is.  As I pointed out in a thread 
on this list back in June, there are multiple large Python 3 email 
"apps" in the new Programming Python, a book which is about to be 
released, and which will be read by at least tens of thousands of 
people, many of whom will be evaluating the stability of Python 3.

These apps include both a simple webmail site, as well as a more
sophisticated 5k-line tkinter email client -- one which I've been 
using for all my personal and business email over the last 6 months,
and which works well with the email package as it is in 3.1 (albeit
with a bit of workaround code).  This includes support for Unicode,
MIME, headers, attachments, and the lot.

I'm forwarding a link to the code of these clients to David by 
private email in case they might be useful as a test case (O'Reilly
has already posted them ahead of the book, but they may be a bit too
heavy for use in formal testing).

The email package is obviously less than ideal today, and there are
many other clients for it besides my own, of course.  But making it 
backward incompatible at this point is likely to be seen as a big 
negative to newcomers evaluating 3.X viability.  And as I tried to 
make clear in June, this list should carefully weigh the PR cost of 
pulling the rug out from under those brave souls who have already 
taken the time to accommodate the 3.X world you've mandated.

To put that more strongly, the Python user base is much larger than 
this list's readership.  If I'm using 3.1 email, so are many others.
People will accept the 3.X world you make up to a point, but it's 
impossible to code to a moving target, much less base a product on 
it.  At some point, they'll simply stop trying to keep up; in fact, 
some already have.

Fixes are a Good Thing, of course, and this particular change's scope
remains to be seen; but to channel most of the users I meet out there
in the real world today: Enough with the 3.X changes already, eh?

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Inconsistencies if locale and filesystem encodings are different

2010-10-07 Thread Victor Stinner
Hi,

A PYTHONFSENCODING environment variable was added to Python 3.2: issue #8622. 
This variable introduces an inconstency because the filesystem and the locale 
encodings can now be different.

There are (at least) four issues related to this problem. We have 2 choices to 
fix these issues:

 (a) use the same encoding to encode and decode values (it can be different 
for each issue)

 (b) remove PYTHONFSENCODING variable and raise an error if locale and 
filesystem encodings are different (ensure that both encodings are the same)

Even if choice (a) is not easy to implement, it is feasible and I already 
wrote some patches.

I don't understand how Python interact with other programs who ignore the 
PYTHONFSENCODING environment variable. It's like Python uses its own "locale".

Choice (b) looks easy to implement, but... there is the problem of Mac OS X. 
Mac OS X uses utf-8 encoding for the filesystem (and not the locale encoding), 
whereas it looks like the locale encoding is used for the command line 
arguments. See issue #4388 for more information.

There is also maybe an useful usecase of the PYTHONFSENCODING, but I don't 
remember which one :-)


Issues
--

sys.argv:
 - decoded from the locale encoding
 - subprocess encodes process arguments to the filesystem encoding
=> issue #9992

sys.path:
 - decoded from the locale encoding
 - import encodes paths to the filesystem encoding
=> issue #10014

The script name, read on the command line (eg. python script.py), is decoded 
using the locale encoding, whereas it is used to fill sys.path[0] (without any 
encoding conversion) and import encodes paths to the filesystem encoding.
=> issue #10039

PYTHONWARNINGS environment variable:
 - decoded from the locale encoding
 - subprocess encodes environment variables to the filesystem encoding
=> issue #9988

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 15:00:04 +0900, "Stephen J. Turnbull"  
wrote:
> R. David Murray writes:
> 
>  > > But that's not interesting; you did that with Python 3.  We want to
>  > Of course I did it with Python3.  It's the Python3 email codebase
>  > I'm working with (and have to work *around*).
> 
> Sure.  My point is that it has nothing to do with the expections of
> people trying to upgrade their apps to Python 3, and meeting those
> expectations is an important requirement of the specification of
> email5, right?

Well, not necessarily, no.  Python3 broke backward compatibility.
*Some* changes are going to have to be made in user code to make it
work with email5.  Where we can minimize those changes we should,
but it isn't a requirement, no.  With my patch, the minimization will
be message_from_string --> message_from_bytes, message_from_file -->
message_from_binary_file, and in some cases Generator --> BytesGenerator,
for those programs that need to deal with wire format data that is not
7bit clean.  Programs that only *generate* emails should need few
if any changes, but that is already true (that's the half of email
that is working :).

> Actually, in context we were not talking about a random character that
> came in from outside, we were talking about U+FFFD that *we*
> generated, and *know* that it's the only non-ASCII character in the
> string because we replaced all the others with it.

Ah, so that *was* what you were suggesting.

> Of course the best we can do with 'From: =?UNKNOWN?Q?p=C3=B6stal' or
> 'From: p\xc3\xb6stal' on input is to save the encoded or raw bytes
> representation and spit it back out on output.

Yes.  And I haven't actually dealt with what to do with non-ascii
characters or RFC2047 unknown-8bit characters when decoding
headers in email6.  In issue 6302 we are talking about adding a
decode_header_to_string method for email5 where the same issue arises,
and so we'll need to make a decision soon.  Presumably we'll use U+FFFD
to replace them (along with registering defects in email6).

> The MIME-charset = UNKNOWN dodge might be a better way of handling
> this.  The str is all ASCII, so won't raise exceptions unless the app
> itself objects to MIME encoded-words for some reason.  OTOH, the
> presence of encoded words will be a red flag to any human viewer, and
> after processing with .flatten(), the receiver is likely to DTRT (from
> the receiving human's point of view, per that human's configuration).

That is a very interesting idea.  It is the *right* thing to do, since it
would mean that a message parsed as bytes could be generated via Generator
and passed to, say, smtplib without losing any information.  However,
It's not exactly trivial to implement, since issues of runs of characters
and line re-wrapping need need to be dealt with.  Perhaps Header can be
made to handle bytes in order to do this; I'll have to look in to it.

>  > So you are suggesting that I should use U+FFFD encoded as UTF-8
>  > rather than '?' as the substitution character?  But earlier you said
>  > that people would probably rather not be forced to deal with Unicode
>  > just because there are invalid bytes in the message.  So that's
>  > probably not what you meant.
> 
> "Suggest" !=3D "recommend".  Talking to a wider base of users and
> developers, you might or might not find that to be a good idea.  I
> don't think the 800 million or so Chinese coming online in the next
> decade will much care whether you use U+FFFD or '?'.  The Japanese
> would prefer U+2639 WHITE FROWNING FACE or U+270C VICTORY HAND, no
> doubt ("crassly cute" is much beloved here).  Americans will likely
> prefer '?', as they probably have correspondents with legacy systems
> that won't like UTF-8 or perhaps don't have a font to display U+FFFD.

For the moment I think I'll stick with '?', with the idea of "fixing
that bug" by using the unknown charset trick at a later stage.

>  > Presumably you are suggesting that email5 be smart enough to turn my
>  > example into properly UTF-8/CTE encoded text.
> 
> No, in general that's undecidable without asking the originator,
> although humans can often make a good guess.  But not always: Japanese
> are fond of "four-character compound words", and I once found an
> 8-byte sequence (four 2-byte characters) that is idiomatic in both
> Shift JIS and EUC-JP.  Even a dictionary lookup can't determine the
> intended encoding for that sequence.

I was talking about unicode input, though, where you do know (modulo
the language differences that unicode hasn't yet sorted out).

> I'm only saying that any Unicode email-N generates itself can be
> properly encoded.

Agreed.

>  > But *that* problem is what email6 is trying to address.  It just
>  > doesn't look practical to address it directly in the email5 code
>  > base, because the email4 codebase that email5 inherits does not
>  > provide the correct distinction between bytes and text.  email5 is
>  > parsing the input stream *as if* it were ASCII

Re: [Python-Dev] opinions on issue2142 ('\ No newline at end of file' to difflib.unified_diff)?

2010-10-07 Thread Dirkjan Ochtman
On Thu, Oct 7, 2010 at 00:53, Trent Mick  wrote:
> 1. Change `difflib.unified_diff` to emit:
>
>    ['---  \n', '+++  \n', '@@ -1,3 +1,3 @@\n', ' one\n', ' two\n',
> '-three\n', '\ No newline at end of file', '+trois\n', '\ No newline
> at end of file']
>
> instead of:
>
>    ['---  \n', '+++  \n', '@@ -1,3 +1,3 @@\n', ' one\n', ' two\n',
> '-three', '+trois']
>
> for this case.
>
>
> 2. Add a `add_end_of_file_newline_markers_to_make_patch_happy` keyword
> arg (probably with a different name:) to `difflib.unified_diff` to do
> this additional handling. The reason is to not surprise existing code
> that would be surprised with those "\No newline at end of file"
> entries.
>
> 3. Not touch `difflib.unified_diff` and instead update
> http://docs.python.org/library/difflib.html#difflib-interface
> documentation to discuss the issue and show how users of unified_diff
> should handle this case themselves.

Mark (in the issue) argues that there is no specification for diffs,
and that this is thus a feature, not a bug. On the other hand, in
Mercurial we've maintained the idea that diffs are specified by
whatever (GNU) patch(1) accepts. So I would support treating this as a
bug, not just a feature. As such, I think 3.2 should emit the extra
line by default and add a keyword argument to make it easy to revert
to the old behavior (and add docs to 2.7, 3.1 and 3.2 about the
issue!).

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] opinions on issue2142 ('\ No newline at end of file' to difflib.unified_diff)?

2010-10-07 Thread R. David Murray
On Wed, 06 Oct 2010 15:53:59 -0700, Trent Mick  wrote:
> Soliciting opinions on issue 2142 (http://bugs.python.org/issue2142).
> There are patched available so this isn't vapour. :)
[...]
> Possiblities:
> 
> 1. Change `difflib.unified_diff` to emit:
> 
> ['---  \n', '+++  \n', '@@ -1,3 +1,3 @@\n', ' one\n', ' two\n',
> '-three\n', '\ No newline at end of file', '+trois\n', '\ No newline
> at end of file']
> 
> instead of:
> 
> ['---  \n', '+++  \n', '@@ -1,3 +1,3 @@\n', ' one\n', ' two\n',
> '-three', '+trois']
> 
> for this case.
> 
> 
> 2. Add a `add_end_of_file_newline_markers_to_make_patch_happy` keyword
> arg (probably with a different name:) to `difflib.unified_diff` to do
> this additional handling. The reason is to not surprise existing code
> that would be surprised with those "\No newline at end of file"
> entries.
> 
> 3. Not touch `difflib.unified_diff` and instead update
> http://docs.python.org/library/difflib.html#difflib-interface
> documentation to discuss the issue and show how users of unified_diff
> should handle this case themselves.
> 
> Thoughts?

I don't think (1) is a good option both for backward compatibility
reasons and (as mentioned in the ticket IIRC) because not all programs
using difflib use it to generate diffs for direct output.  (2) might
be worth it given that there is a "standard" to follow so it might be
worth coding that standard into the stdlib.

> Orthogonal: *After* a decision is made for the Python 3.3 tree we can
> discuss if including this in either of Python 2.7 or 3.2 would be
> wanted.

(3) is the only option for 2.7/3.1.  We're still pre-beta on 3.2, so
(2) is still an option there.  3.3 doesn't enter the discussion until
after 3.2 beta 1.

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] python-2.6.6 coredump running newspipe

2010-10-07 Thread Thomas Klausner
Hi!

I'm running newspipe-1.1.9, an RSS reader
(http://newspipe.sourceforge.net/), on NetBSD-5.99.11/amd64 using
Python-2.6.6.

Sometimes, it core dumps with particular feeds in the configuration (I
guess depending on the feed, because when I comment out the offending
feed in the opml file, it runs through to completion).

The backtrace looks like this:
Core was generated by `python'.
Program terminated with signal 10, Bus error.
#0  0x7f7ffdc35a21 in PyOS_snprintf (str=0x7f7ff5dfe3d8 "@", size=120, 
format=0x1 ) at Python/mysnprintf.c:43
43  {
(gdb) bt
#0  0x7f7ffdc35a21 in PyOS_snprintf (str=0x7f7ff5dfe3d8 "@", size=120, 
format=0x1 ) at Python/mysnprintf.c:43
#1  0x7f7ffdc471a6 in PyOS_ascii_formatd (buffer=0x7f7ff5dfe3d8 "@", 
buf_size=120, format=0x7f7ff5dfe388 "%.2f", d=0.15256118774414062) at 
Python/pystrtod.c:455
#2  0x7f7ffdbaa7fa in formatfloat (buf=0x7f7ff5dfe3d8 "@", buflen=120, 
flags=16, prec=2, type=102, v=0x7f7ffcc6d510) at Objects/stringobject.c:4378
#3  0x7f7ffdbabd32 in PyString_Format (format=0x7f7ffc8144e0, 
args=0x7f7ffcc6d510) at Objects/stringobject.c:4943
#4  0x7f7ffdbaa3b0 in string_mod (v=0x7f7ffc8144e0, w=0x7f7ffcc6d510) at 
Objects/stringobject.c:4116
#5  0x7f7ffdb459db in binary_op1 (v=0x7f7ffc8144e0, w=0x7f7ffcc6d510, 
op_slot=32) at Objects/abstract.c:917
#6  0x7f7ffdb45c81 in binary_op (v=0x7f7ffc8144e0, w=0x7f7ffcc6d510, 
op_slot=32, op_name=0x7f7ffdc6c089 "%") at Objects/abstract.c:969
#7  0x7f7ffdb467ad in PyNumber_Remainder (v=0x7f7ffc8144e0, 
w=0x7f7ffcc6d510) at Objects/abstract.c:1221
#8  0x7f7ffdc08a03 in PyEval_EvalFrameEx (f=0x7f7fefa1dab0, throwflag=0) at 
Python/ceval.c:1180
#9  0x7f7ffdc1175f in fast_function (func=0x7f7ff8a9bed8, 
pp_stack=0x7f7ff5dfeae8, n=1, na=1, nk=0) at Python/ceval.c:3836
#10 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dfeae8, oparg=1) at 
Python/ceval.c:3771
#11 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7fee920420, throwflag=0) at 
Python/ceval.c:2412
#12 0x7f7ffdc0f715 in PyEval_EvalCodeEx (co=0x7f7ffcc247b0, 
globals=0x7f7ffd1c5880, locals=0x0, args=0x7f7ff5b0aac8, argcount=8, 
kws=0x7f7ff5b0ab08, kwcount=0, defs=0x7f7ff8d3c4e8,
defcount=5, closure=0x0) at Python/ceval.c:3000
#13 0x7f7ffdc1184a in fast_function (func=0x7f7ff8a9cc80, 
pp_stack=0x7f7ff5dfeff8, n=8, na=8, nk=0) at Python/ceval.c:3846
#14 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dfeff8, oparg=7) at 
Python/ceval.c:3771
#15 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7ff5b0a820, throwflag=0) at 
Python/ceval.c:2412
#16 0x7f7ffdc1175f in fast_function (func=0x7f7ff8a9e140, 
pp_stack=0x7f7ff5dff358, n=1, na=1, nk=0) at Python/ceval.c:3836
#17 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dff358, oparg=0) at 
Python/ceval.c:3771
#18 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7ff5b0a420, throwflag=0) at 
Python/ceval.c:2412
#19 0x7f7ffdc1175f in fast_function (func=0x7f7ffca1db90, 
pp_stack=0x7f7ff5dff6b8, n=1, na=1, nk=0) at Python/ceval.c:3836
#20 0x7f7ffdc11565 in call_function (pp_stack=0x7f7ff5dff6b8, oparg=0) at 
Python/ceval.c:3771
#21 0x7f7ffdc0d81f in PyEval_EvalFrameEx (f=0x7f7ff5b03190, throwflag=0) at 
Python/ceval.c:2412
#22 0x7f7ffdc0f715 in PyEval_EvalCodeEx (co=0x7f7ffca0d4e0, 
globals=0x7f7ffca473a0, locals=0x0, args=0x7f7ff04d3e68, argcount=1, kws=0x0, 
kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3000
#23 0x7f7ffdb7a612 in function_call (func=0x7f7ffca1daa0, 
arg=0x7f7ff04d3e50, kw=0x0) at Objects/funcobject.c:524
#24 0x7f7ffdb495e8 in PyObject_Call (func=0x7f7ffca1daa0, 
arg=0x7f7ff04d3e50, kw=0x0) at Objects/abstract.c:2492
#25 0x7f7ffdb5eca0 in instancemethod_call (func=0x7f7ffca1daa0, 
arg=0x7f7ff04d3e50, kw=0x0) at Objects/classobject.c:2579
#26 0x7f7ffdb495e8 in PyObject_Call (func=0x7f7ff8ac2a00, 
arg=0x7f7ffd112050, kw=0x0) at Objects/abstract.c:2492
#27 0x7f7ffdc10cd3 in PyEval_CallObjectWithKeywords (func=0x7f7ff8ac2a00, 
arg=0x7f7ffd112050, kw=0x0) at Python/ceval.c:3619
#28 0x7f7ffdc4e69f in t_bootstrap (boot_raw=0x7f7ffd1b4590) at 
./Modules/threadmodule.c:428
#29 0x7f7ffd90ba32 in pthread_setcancelstate () from 
/usr/lib/libpthread.so.1
#30 0x7f7ffd26e9b0 in ___lwp_park50 () from /usr/lib/libc.so.12
#31 0x in ?? ()
(gdb) fr 1
#1  0x7f7ffdc471a6 in PyOS_ascii_formatd (buffer=0x7f7ff5dfe3d8 "@", 
buf_size=120, format=0x7f7ff5dfe388 "%.2f", d=0.15256118774414062) at 
Python/pystrtod.c:455
455 PyOS_snprintf(buffer, buf_size, format, d);
(gdb) l
450 format = tmp_format;
451 }
452
453
454 /* Have PyOS_snprintf do the hard work */
455 PyOS_snprintf(buffer, buf_size, format, d);
456
457 /* Do various fixups on the return string */
458
459 /* Get the current locale, and find the decimal point string.
(gdb) p format
$1 = 0x7f7ff5dfe388 "%.2f"
(gdb) fr 0
#0  0x7f7ffdc35a21 in PyOS_snprintf (str

[Python-Dev] ConFoo 2011 Call for speakers

2010-10-07 Thread Cyril Robert
Greetings Python developers,

We, Montréal-Python, are the coordinators of the Python track at
ConFoo 2011 and we are very proud to announce our call for speakers.

PHP-Québec, Montréal-Python, Montreal.rb, W3Qc, and OWASP Montréal are
organizing the first edition of the ConFoo conference, which will be
held in Montréal on March 9th through 11th at the Hilton Bonaventure
Hotel. With over 500 expected attendees, ConFoo is one of the largest
Web development conference in North America.

ConFoo is about the Web, but it's also about showcasing the strengths
of different technologies. Do you think that Python beats all the
other languages out there for Web development? Do you think that
Python knocks Perl 6's socks off? Come and tell us why!

Sessions are one hour long and you can present in French or in
English, which ever your prefer. Submissions are due for November 26;
for more details, visit the ConFoo website:

 http://confoo.ca/en/call-for-papers

By the way, we are perfectly aware that there is a slight clash with
PyCon. You should not worry about that since all Python talks will
happen before Friday, giving you plenty of time for the commute
towards Atlanta.

Share the word!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] hashlib bug when built on OS X 10.6 for 10.5

2010-10-07 Thread Ronald Oussoren

On 6 Oct, 2010, at 22:26, Arnaud Bergeron wrote:

> If you build python (at least 2.6.5, but probably other versions as
> well) in a universal setup using the following command (or similar)
> while the machine currently has 10.6 installed:
> 
> ./configure --with-universal-archs=
> --enable-universalsdk=/Developer/SDKs/MacOSX10.5.sdk/
> 
> then the hashlib module will be unimportable with the following error:
> 
> ImportError: No module named _md5
> 
> This is because the openssl detection code in setup.py picks up the
> system libssl (in /) which is 0.9.8 and selects not to build the
> additional _sha256, and _sh512 modules.  Then, when the builds
> proceeds the SDK libssl is used (in /Developer/SDKs/MacOSX10.5.sdk/)
> which is version 0.9.7 and the openssl_sha256 and openssl_sha512 raise
> ValueError in the _hashlib that is built.

This definitely looks like a bug I fixed before, see 
. The patch should not be present in 2.7 and 
any of the active branches.

I can build hashlib just fine with the HEAD of the release26-maint branch, 
could you please test with 2.6.6?

BTW. In general it is better to file bugs at bugs.python.org instead of mailing 
them here, using the bugtracker ensures that your report doesn't get lost.

Ronald

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
Stephen J. Turnbull  xemacs.org> writes:
> R. David Murray writes:
>  > We're (in the current patch) not punting on handling non-conforming
>  > email, we're punting on handling non-conforming bytes *if the headers
>  > that contain them need to be modified*.  The headers can still be
>  > modified, you just (currently) lose the non-ASCII bytes in the process.
> 
> Modified *or examined*.  I can't think of any important applications
> offhand that *need* to examine the non-ASCII bytes (in particular,
> Mailman doesn't need to do that).  Verbatim copying of the bytes
> themselves is almost always the desired usage.

Mmm.  Yes, or examined.  If we allow escaped bytes to be returned, perhaps
we also should provide a helper that "unescapes" the bytes and returns
the byte string (yes, this is just a call to encode, but by wrapping it
we continue to hide the surrogateescape implementation detail.)

>  > And robustness is not the issue, only extended-beyond-the-RFCs handling
>  > of non-conforming bytes would be an issue.
> 
> And with that, I'm certain that Jon Postel is really dead. 

A goal for email6 is to be *at least* as Postel compliant as email4.
The goal for my patch is to make email5.1 more Postel compliant than
email5.0 is :)

>  > > (Surely you are not saying that Generator.flatten can't DTRT with
>  > > non-ASCII content *at all*?)
>  > 
>  > Yes, that is *exactly* what I am saying:
>  > 
>  > >>> m = email.message_from_string("""\
>  > ... From: pöstal
>  > ...   
>  > ... """)
>  > >>> str(m)
>  > Traceback (most recent call last):
>  >   
>  > UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in 
> position 1: ordinal not in range(128)
> 
> But that's not interesting; you did that with Python 3.  We want to

Of course I did it with Python3.  It's the Python3 email codebase
I'm working with (and have to work *around*).

> know what people porting from Python 2 will expect.  So, in 2.5.5 or
> 2.6.6 on Mac, with email v4.0.2, it *doesn't* raise, it returns
> 
> wideload:~ 4:14$ python
> Python 2.5.5 (r255:77872, Jul 13 2010, 03:03:57) 
> [GCC 4.0.1 (Apple Inc. build 5490)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import email
> >>> m=email.message_from_string('From: pöstal\n\n')
> >>> str(m)
> 'From nobody Thu Oct  7 04:18:25 2010\nFrom: p\xc3\xb6stal\n\n'
> >>> m['From']
> 'p\xc3\xb6stal'
> >>> 
> 
> That's hardly helpful!  Surely we can and should do better than that
> now, especially since UTF-8 (with a proper CTE) is now almost
> universally acceptable to MUAs.  When would it be a problem for that
> to return
> 
> 'From nobody Thu Oct  7 04:18:25 2010\nFrom: =?UTF-8?Q?p=C3=B6stal?=\n\n'

What's wrong with that is that when we parse the bytes of the message
we don't know that b'\xc3\xb6' == '=?UTF-8?Q?=C3=B6?='.  It isn't even
all that likely to be true, since I would guess that latin1 is still
more common than utf-8 (but you might know better).

>  > Remember, email5 is a direct translation of email4, and email4 only
>  > handled ASCII and oh-by-the-way-if-there-are-bytes-along-for-the-
>  > -ride-fine-we'll-pass-then-along.  So if you want to put non-ASCII
>  > data into a message you have to encode it properly to ASCII in
>  > exactly the same way that you did in email4:
> 
> But if you do it right, then it will still work in a version that just
> encodes non-ASCII characters in UTF-8 with the appropriate CTE.  Since
> you'll never be passing it non-ASCII characters, it's already ASCII
> and UTF-8, and no CTE will be needed.

So you are suggesting that I should use U+FFFD encoded as UTF-8
rather than '?' as the substitution character?  But earlier you said
that people would probably rather not be forced to deal with Unicode
just because there are invalid bytes in the message.  So that's
probably not what you meant.

Presumably you are suggesting that email5 be smart enough to turn my
example into properly UTF-8/CTE encoded text.  But *that* problem is what
email6 is trying to address.  It just doesn't look practical to address it
directly in the email5 code base, because the email4 codebase that email5
inherits does not provide the correct distinction between bytes and text.
email5 is parsing the input stream *as if* it were ASCII-only CTE text.
I'm trying to extend it to also handle non-ASCII bytes gracefully.
Extending it to actually handle unicode input is a whole different kettle
of sushi[*].

>  > Yes, exactly.  I need to fix the patch to recode using, say,
>  > quoted-printable in that case.
> 
> It really should check for proportions of non-ASCII.  QP would be
> horrible for Japanese or Chinese.

Noted.

>  > DecodedGenerator could still produce the unicode, though, which is
>  > what I believe we want.  (Although that raises the question of
>  > whether DecodedGenerator should also decode the RFC2047 encoded
>  > headersbut that raises a backward compatibility issue).
> 
> Can't really help you there.  While I would

[Python-Dev] opinions on issue2142 ('\ No newline at end of file' to difflib.unified_diff)?

2010-10-07 Thread Trent Mick
Soliciting opinions on issue 2142 (http://bugs.python.org/issue2142).
There are patched available so this isn't vapour. :)

The issue is this (I'll discuss only unified_diff(), but the same
applies to context_diff()):

>>> from difflib import *
>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
..."one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
 one
 two
-three+trois

Where as with `diff`, `hg` and `git`:

diff -r 667b0870428d a
--- a/a Wed Oct 06 15:39:50 2010 -0700
+++ b/a Wed Oct 06 15:40:31 2010 -0700
@@ -1,3 +1,3 @@
 one
 two
-three
\ No newline at end of file
+trois
\ No newline at end of file



While originally marked as a *bug*, the issue was changed to be a
*feature* request, because arguably `difflib.unified_diff()` is fine,
and the problem is in the naive use of the following to create a patch
that will work with `patch`:

   ''.join(gen)


Possiblities:

1. Change `difflib.unified_diff` to emit:

['---  \n', '+++  \n', '@@ -1,3 +1,3 @@\n', ' one\n', ' two\n',
'-three\n', '\ No newline at end of file', '+trois\n', '\ No newline
at end of file']

instead of:

['---  \n', '+++  \n', '@@ -1,3 +1,3 @@\n', ' one\n', ' two\n',
'-three', '+trois']

for this case.


2. Add a `add_end_of_file_newline_markers_to_make_patch_happy` keyword
arg (probably with a different name:) to `difflib.unified_diff` to do
this additional handling. The reason is to not surprise existing code
that would be surprised with those "\No newline at end of file"
entries.

3. Not touch `difflib.unified_diff` and instead update
http://docs.python.org/library/difflib.html#difflib-interface
documentation to discuss the issue and show how users of unified_diff
should handle this case themselves.

Thoughts?



Orthogonal: *After* a decision is made for the Python 3.3 tree we can
discuss if including this in either of Python 2.7 or 3.2 would be
wanted.

-- 
Trent Mick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 03:31:34 +0900, "Stephen J. Turnbull"  
wrote:
> R. David Murray writes:
> 
>  >   5.  Return the content, with non-ASCII bytes replaced with ?
>  >   characters.
> 
> That hadn't occurred to me (and it makes me sick to contemplate it).
> 
> That said, this is probably good enough for Mailman-like apps to limp
> along for "most" users.  It's certainly good enough for the "might
> kick your wife and elope with your dog" alpha ports of Mailman to
> Python 3 (well, as certain as I can be; of course in the end Barry
> decides).  Assuming reasonable backward compatibility of the API, of
> course!

Yeah, "good enough" is pretty much the goal here.

>  > In other words, my proposed patch only makes email5 1/8 to 1/4
>  > broken, instead of half broken as it is now.  But not un-broken
>  > enough for Mailman, it sounds like.
> 
> IMO, not in the long run.  But realistically, in the applications I
> know of, most desired traffic is conformant, and since there aren't
> any Python 3 email apps yet, this isn't even a regression. :-/
> 
> I do think that it's important that the parsed object be able to tell
> you what fields are there (except if the field name itself is invalid)
> and return field bodies parsed as far as possible.

Well, email doesn't currently parse the bodies any further by itself.
You have to call parsing routines to get further parsing.  So maybe
what I should do is work on finalizing the patch without addressing the
'give me the escaped bytes issue', and then prepare a follow on patch
that adds that keyword and adjusts the header parsing helpers accordingly.

>  > If we go this route (as opposed to only handling headers with 8bit data by
>  > sanitizing them), then we need to think about the email5 header parsers
>  > as well (decode_header and parseaddr).  They are of course going to have
>  > the same problems as the rest of the email package with parsing bytes,
>  > and you are suggesting that access to those header 8bit bytes is needed.
> 
> Yes, that would be preferable to replacing them with ASCII junk.
> 
> But I don't see any problem with parsing them; they're syntactically
> insignificant by definition.  The problem is purely on output: do I
> get verbatim escaped bytes, a sanitized str, or an exception?

Right, the needed changes should be sanitizing by default, and providing
the keyword to get the escaped bytes.  Mostly it'll be writing tests :)

>  > Does my proposal make sense?  But note, it raises exactly the backward
>  > compatibility concerns you mention in your next email (that I will reply
>  > to next).  It is an open question whether it is worth opening that door
>  > in order to be able to do extended handling on non-RFC conforming email
>  > (as opposed to just sanitizing it and soldering on).
> 
> Well, maybe not.  However, it is not obvious to me that you won't run
> into these issues again in Email6.  Applications that think of email
> as textual objects are going to want to make their own choices about
> handling of non-conforming email, and it's likely to be massively
> inconvenient to say "OK, but you have to use bytes interfaces
> exclusively, because the str interfaces don't handle that."

The strategy in email6 so far is for the application program to be
able to access *any piece* of the parsed data as either text or bytes,
and for the header parsers to record defects when there are non-ASCII
bytes where there aren't supposed to be.  So the application can check
for defects and retrieve, say, the comment field that has the non-ASCII
*as bytes* and decode it.  Or, if it doesn't care about parsing them,
it just modifies the fields it wants to modify that *are* valid, and the
invalid non-ASCII comment gets carried along and emitted when the message
is serialized as bytes.

This is more or less what we are talking about enabling in email5 with
the 'escape_bytes=True' keyword, it's just a less structured and more
error prone approach to it than what we have planned for email6.

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] hashlib bug when built on OS X 10.6 for 10.5

2010-10-07 Thread Arnaud Bergeron
If you build python (at least 2.6.5, but probably other versions as
well) in a universal setup using the following command (or similar)
while the machine currently has 10.6 installed:

./configure --with-universal-archs=
--enable-universalsdk=/Developer/SDKs/MacOSX10.5.sdk/

then the hashlib module will be unimportable with the following error:

ImportError: No module named _md5

This is because the openssl detection code in setup.py picks up the
system libssl (in /) which is 0.9.8 and selects not to build the
additional _sha256, and _sh512 modules.  Then, when the builds
proceeds the SDK libssl is used (in /Developer/SDKs/MacOSX10.5.sdk/)
which is version 0.9.7 and the openssl_sha256 and openssl_sha512 raise
ValueError in the _hashlib that is built.

Then the fancy code in hashlib.py detects the ValueError, tries to
import _sha256 and gets an ImportError which is caught by the code set
to catch an ImportError of _hashlib and this codes tries to import
_md5 which also fails and gives the error above.

In summary, the code in setup.py finds the wrong library and this
creates a situation with which hashlib.py is not ready to handle.

If the analysis is not clear on certain points, feel free to ask questions.

Arnaud
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com