Re: [Python-Dev] Python 2.7 Won't Build

2010-09-17 Thread Victor Stinner
Le vendredi 17 septembre 2010 00:09:09, Tom Browder a écrit :
> I did, and eventually discovered the problem: I tried to "nosy" Barry
> as requested by adding his e-mail address, but that causes an error in
> the tracker.  After I finally figured that out, I successfully entered
> the original bug (and reported it on the "tracker bug").

http://bugs.python.org/issue9880

Ah, yes, you have to add nicknames, not emails. Barry nickname is "barry", and 
he's already on the nosy list (because he answered to your issue).

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-17 Thread Baptiste Carvello

R. David Murray a écrit :

I'm trying one approach in email6:
Bytes and String subclasses, where the subclasses have an attribute
named 'literals' derived from a utility module that does this:

literals = dict(
empty = '',
colon = ':',
newline = '\n',
space = ' ',
tab = '\t',
fws = ' \t',
headersep = ': ',
)

class _string_literals:
pass
class _bytes_literals:
pass

for name, value in literals.items():
setattr(_string_literals, name, value)
setattr(_bytes_literals, name, bytes(value, 'ASCII'))
del literals, name, value

And the subclasses do:

class BytesHeader(BaseHeader):
lit = email.utils._bytes_literals

class StringHeader(BaseHeader):
lit = email.utils._string_literals



I've just written a decorator which applies a similar strategy for insulated 
functions, by passing them an appropriate namespace as an argument. It could be 
useful in cases where only a few functions are polymorphic, not a full class or 
module.


http://code.activestate.com/recipes/577393-decorator-for-writing-polymorphic-functions/

Cheers, B.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [issue1633863] AIX: configure ignores $CC

2010-09-17 Thread Sébastien Sablé

Hi Martin,

I have started to correct quite a lot of issues I have with Python on 
AIX, and since I had to test quite a lot of patchs, I though it would be 
more convenient to setup a buildbot for that platform.


So I now have a buildbot environment with 2 slaves (AIX 5.3 and 6.1) 
that builds and tests Python (branch py3k) with both gcc and xlc (the 
native AIX compiler) (I have 4 builders ("py3k-aix6-xlc", 
"py3k-aix5-xlc", "py3k-aix6-gcc", "py3k-aix5-gcc").


I expect to add 4 more builders for branch 2.7 in coming days.

I would like to share the results of this buildbot to the Python 
community so that issues with AIX could be addressed more easily.


R. David Murray pointed me to the page on the python wiki concerning 
buildbot. It is stated there that is is possible to connect some slaves 
to some official Python buildbot master.


Unfortunately, I don't think this solution is possible for me: I don't 
think the security team in my company would appreciate that a server 
inside our network runs some arbitrary shell commands provided by some 
external source. Neither can I expose the buildbot master web interface.


Also I had to customize the buildbot rules in order to work with some 
specificities of AIX (see attached master.cfg), and I can't guarantee 
that this buildbot will run 24 hours a day; I may have to schedule it 
only once at night for example if it consumes too much resources.


(And the results are very unstable at the moment, mostly because of 
issue 9862).


On the other hand, I could upload the build results with rsync or scp 
somewhere or setup some MailNotifier if that can help.


How do you think I could share those results?

regards

--
Sébastien Sablé



Le 15/09/2010 23:28, R. David Murray a écrit :


R. David Murray  added the comment:

Sébastien, you could email Martin (tracker id loewis) about adding your 
buildbot to our unstable fleet (or even to stable if it is stable; that is, the 
tests normally pass and don't randomly fail).  As long as you are around to 
help fix bugs it would be great to have an aix buildbot in our buildbot fleet.

(NB: see also http://wiki.python.org/moin/BuildBot, which unfortunately is a 
bit out of date...)

--
nosy: +r.david.murray

___
Python tracker

___


# -*- python -*-
# ex: set syntax=python:

# This is a sample buildmaster config file. It must be installed as
# 'master.cfg' in your buildmaster's base directory (although the filename
# can be changed with the --basedir option to 'mktap buildbot master').

# It has one job: define a dictionary named BuildmasterConfig. This
# dictionary has a variety of keys to control different aspects of the
# buildmaster. They are documented in docs/config.xhtml .

# This is the dictionary that the buildmaster pays attention to. We also use
# a shorter alias to save typing.
c = BuildmasterConfig = {}

### BUILDSLAVES

# the 'slaves' list defines the set of allowable buildslaves. Each element is
# a BuildSlave object, which is created with bot-name, bot-password.  These
# correspond to values given to the buildslave's mktap invocation.
from buildbot.buildslave import BuildSlave
c['slaves'] = [BuildSlave("phenix", "bot1passwd", max_builds=1),
   BuildSlave("sirius", "bot2passwd", max_builds=1)]

# to limit to two concurrent builds on a slave, use
#  c['slaves'] = [BuildSlave("bot1name", "bot1passwd", max_builds=2)]


# 'slavePortnum' defines the TCP port to listen on. This must match the value
# configured into the buildslaves (with their --master option)

c['slavePortnum'] = 9989

### CHANGESOURCES

# the 'change_source' setting tells the buildmaster how it should find out
# about source code changes. Any class which implements IChangeSource can be
# put here: there are several in buildbot/changes/*.py to choose from.

from buildbot.changes.pb import PBChangeSource
c['change_source'] = PBChangeSource()

# For example, if you had CVSToys installed on your repository, and your
# CVSROOT/freshcfg file had an entry like this:
#pb = ConfigurationSet([
#(None, None, None, PBService(userpass=('foo', 'bar'), port=4519)),
#])

# then you could use the following buildmaster Change Source to subscribe to
# the FreshCVS daemon and be notified on every commit:
#
#from buildbot.changes.freshcvs import FreshCVSSource
#fc_source = FreshCVSSource("cvs.example.com", 4519, "foo", "bar")
#c['change_source'] = fc_source

# or, use a PBChangeSource, and then have your repository's commit script run
# 'buildbot sendchange', or use contrib/svn_buildbot.py, or
# contrib/arch_buildbot.py :
#
#from buildbot.changes.pb import PBChangeSource
#c['change_source'] = PBChangeSource()

# If you wat to use SVNPoller, it might look something like
#  # Where to get source code changes
# from buildbot.changes.svnpoller import SVNPoller
# source_code_svn_url='https://svn.myproject.org/bluejay/trunk'

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-17 Thread Antoine Pitrou
Le jeudi 16 septembre 2010 à 22:51 -0400, R. David Murray a écrit :
> > > On disk, using utf-8,
> > > one might store the text representation of the message, rather than
> > > the wire-format (ASCII encoded) version.  We might want to write such
> > > messages from scratch.
> > 
> > But then the user knows the encoding (by "user" I mean what/whoever
> > calls the email API) and mentions it to the email package.
> 
> Yes?  And then?  The email package still has to parse the file, and it
> can't use its normal parse-the-RFC-data parser because the file could
> contain *legitimate* non-ASCII header data.  So there has to be a separate
> parser for this case that will convert the non-ASCII data into RFC2047
> encoded data.  At that point you have two parsers that share a bunch of
> code...and my current implementation lets the input to the second parser
> be text, which is the natural representation of that data, the one the
> user or application writer is going to expect.

But you said it yourself: that "e-mail-like data" data is not an email.
You could have a separate converter class for these special cases.

Also, I don't understand why an application would want to assemble an
e-mail by itself if it doesn't know how to do so, and produces wrong
data. Why not simply let the application do:

m = Message()
m.add_header("From", "Accented Bàrry ")
m.add_body("Hello Barry")

> > And then you have two separate worlds while ultimately the same
> > concepts are underlying. A library accepting BytesMessage will crash
> > when a program wants to give a StringMessage and vice-versa. That
> > doesn't sound very practical.
> 
> Yes, and a library accepting bytes will crash when a program wants
> to give it a string.  So?  That's how Python3 works.  Unless, of
> course, the application decides to be polymorphic :)

Well, the application wants to handle abstracted e-mail messages. I'm
sure people would rather not deal with the difference(s) between
BytesMessages and StringMessages.

That's like saying we should have BytesConfigParser for bytes
configuration files and StringConfigParser for string configuration
files, with incompatible APIs.

("surrogateescape")
> On the other hand, that might be a way to make the current API work
> at least a little bit better with 8bit input data.  I'll have to think
> about that...

Yes, that's what I was talking about.
You can even choose ("ascii", "surrogateescape") if you don't want to
wrongly choose an 8-bit encoding such as utf-8 or latin-1.
(I'm deliberately ignoring the case where people would use a non-ASCII
compatible encoding such as utf-16; I hope you don't want to support
that :-))

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standards for distribution names

2010-09-17 Thread Dan Buch
On Thu, Sep 16, 2010 at 12:08:59PM +0100, Chris Withers wrote:
> Hi All,
> 
> Following on from this question:
> 
> http://twistedmatrix.com/pipermail/twisted-python/2010-September/022877.html
> 
> ...I'd thought that the "correct names" for distributions would have
> been documented in one of:
> 
> http://www.python.org/dev/peps/pep-0345
> http://www.python.org/dev/peps/pep-0376
> http://www.python.org/dev/peps/pep-0386
> 
> ...but having read them, I drew a blank.
> 
> Where are the standards for this or is it still a case of "whatever
> setuptools does"?
> 
> Chris
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/daniel.buch%40gmail.com

You may also find this thread from the packaging google group useful,
although it may not be quite what you're looking for:

http://bit.ly/96SMuM

Cheers,

-- 
~Dan



pgpNBmBHKIAoI.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [issue1633863] AIX: configure ignores $CC

2010-09-17 Thread Martin v. Löwis

Hi Sebastien,


Unfortunately, I don't think this solution is possible for me: I don't
think the security team in my company would appreciate that a server
inside our network runs some arbitrary shell commands provided by some
external source.


I still think this would be the best thing, and I feel that from a 
security point of view, it doesn't really differ from what you are

doing now already - see below.


Neither can I expose the buildbot master web interface.


That shouldn't be necessary.


Also I had to customize the buildbot rules in order to work with some
specificities of AIX (see attached master.cfg), and I can't guarantee
that this buildbot will run 24 hours a day; I may have to schedule it
only once at night for example if it consumes too much resources.

(And the results are very unstable at the moment, mostly because of
issue 9862).


If you are having the build slave compile Python, I'd like to point
out that you *already* run arbitrary shell commands provided by
some external source: if somebody would check some commands into 
Python's configure.in, you would unconditionally execute them.

So if it's ok that you run the Python build process at all, it should
(IMO) also be acceptable to run a build slave.

If there are concerns that running it under your Unix account gives it
too much power, you should create a separate, locked-down account.


On the other hand, I could upload the build results with rsync or scp
somewhere or setup some MailNotifier if that can help.

How do you think I could share those results?


I'd be hesitant to support this as a special case. If the results
are not in the standard locations, people won't look at them, anyway.
Given that one often also needs access to the hardware in order to
fix problems, it might be sufficient if only you look at the buildslave
results, and create bug reports whenever you notice a problem.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [issue1633863] AIX: configure ignores $CC

2010-09-17 Thread Antoine Pitrou
On Fri, 17 Sep 2010 11:40:12 +0200
Sébastien Sablé  wrote:
> Hi Martin,
> 
> I have started to correct quite a lot of issues I have with Python on 
> AIX, and since I had to test quite a lot of patchs, I though it would be 
> more convenient to setup a buildbot for that platform.
> 
> So I now have a buildbot environment with 2 slaves (AIX 5.3 and 6.1) 
> that builds and tests Python (branch py3k) with both gcc and xlc (the 
> native AIX compiler) (I have 4 builders ("py3k-aix6-xlc", 
> "py3k-aix5-xlc", "py3k-aix6-gcc", "py3k-aix5-gcc").

Following on Martin's comments, you might also want to share things
with the ActiveState guys who, AFAIK, maintain an AIX version of Python
(but you have been the most active AIX user on the bug tracker lately;
perhaps they are keeping their patches to themselves).

(see http://www.activestate.com/activepython )

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add PEP 444, Python Web3 Interface.

2010-09-17 Thread Martin v. Löwis

Am 16.09.10 02:02, schrieb John Nagle:

On 9/15/2010 4:44 PM, [email protected] wrote:

``SERVER_PORT`` must be a bytes instance (not an integer).


What's that supposed to mean? What goes in the "bytes
instance"? A character string in some format? A long binary
number? If the latter, with which byte ordering? What
problem does this\ solve?


Just interpreting (i.e. not having participated in the specification):

Given the CGI background of all this, SERVER_PORT is an ASCII-encoded
decimal rendering of the port number.

As to what problem this solves: I guess it allows for easy pass-through
from the web server.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-17 Thread Nick Coghlan
On Fri, Sep 17, 2010 at 5:43 AM, Martin (gzlist)  wrote:
> In the example I gave, 十 encodes in CP932 as '\x8f\\', and the
> function gets confused by the second byte. Obviously the right answer
> there is just to use unicode, rather than write a function that works
> with weird multibyte codecs.

That does make it clear that "ASCII superset" is an inaccurate term -
a better phrase is "ASCII compatible", since that correctly includes
multibyte codecs like UTF-8 which explicitly ensure that the byte
values in multibyte characters are all outside the 0x00 to 0x7F range
of ASCII.

So the domain of any polymorphic text manipulation functions we define would be:
  - Unicode strings
  - byte sequences where the encoding is either:
- a single byte ASCII superset (e.g. iso-8859-*, cp1252, koi8*, mac*)
- an ASCII compatible multibyte encoding (e.g. UTF-8, EUC-JP)

Passing in byte sequences that are encoded using an ASCII incompatible
multibyte encoding (e.g. CP932, UTF-7, UTF-16, UTF-32, shift-JIS,
big5, iso-2022-*, EUC-CN/KR/TW) or a single byte encoding that is not
an ASCII superset (e.g. EBCDIC) will have undefined results.

I think that's still a big enough win to be worth doing, particularly
as more and more of the other variable width multibyte encodings are
phased out in favour of UTF-8.

Cheers,
Nick.

P.S. Hey Barry, is there anyone at Canonical you can poke about
https://bugs.launchpad.net/xorg-server/+bug/531208? Tinkering with
this stuff on Kubuntu would be significantly less annoying if I could
easily type arbitrary Unicode characters into Konsole ;)

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-17 Thread Barry Warsaw
On Sep 17, 2010, at 12:10 PM, Antoine Pitrou wrote:

>Also, I don't understand why an application would want to assemble an
>e-mail by itself if it doesn't know how to do so, and produces wrong
>data. Why not simply let the application do:
>
>m = Message()
>m.add_header("From", "Accented Bàrry ")
>m.add_body("Hello Barry")

Very often you'll start with a template of a message your application wants to
send.  Then you'll interpolate a few values into it, and you'd like to easily
convert the result into an RFC valid email.

Is that template bytes or text (or either)?

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-17 Thread Barry Warsaw
On Sep 16, 2010, at 11:45 PM, Terry Reedy wrote:

>Based on the discussion so far, I think you should go ahead and
>implement the API agreed on by the mail sig both because is *has* been
>agreed on (and thinking about the wsgi discussion, that seems to be a
>major achievement) and because it seems sensible to me also, as far as
>I understand it. The proof of the API will be in the testing. As long
>as you *think* it covers all intended use cases, I am not sure that
>abstract discussion can go much further.

+1

>I do have a thought about space and duplication. For normal messages,
>it is not an issue. For megabyte (or in the future, gigabyte?)
>attachments, it is. So if possible, there should only be one extracted
>blob for both bytes and string versions of parsed messages. Or even
>make the extraction from the raw stream lazy, when specifically
>requested.

This has been discussed in the email-sig.  Many people have asked for an API
where message payloads can be stored on-disk instead of in-memory.  Headers, I
don't think will every practically be so big as to not be storable in-memory.
But if your message has a huge mp3, the parser should have the option to leave
the bytes of that payload in a disk cache and transparently load it when
necessary.

I think we should keep that in mind, but it's way down on the list of "gotta
haves" for email6.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2010-09-17 Thread Python tracker

ACTIVITY SUMMARY (2010-09-10 - 2010-09-17)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues stats:
  open2541 (+42)
  closed 19128 (+69)
  total  21669 (+65)

Open issues with patches: 1060 


Issues opened (42)
==

#9824: SimpleCookie should escape commas and semi-colons
http://bugs.python.org/issue9824  opened by spookylukey

#9831: test_distutils fails on MacOSX 10.6
http://bugs.python.org/issue9831  opened by cartman

#9838: Inadequate C-API to Python 3 I/O objects
http://bugs.python.org/issue9838  opened by pv

#9841: sysconfig and distutils.sysconfig differ in subtle ways
http://bugs.python.org/issue9841  opened by eric.araujo

#9842: Document ... used in recursive repr of containers
http://bugs.python.org/issue9842  opened by eric.araujo

#9844: calling nonexisting function under __INSURE__
http://bugs.python.org/issue9844  opened by eli.bendersky

#9845: Allow changing the method in urllib.request.Request
http://bugs.python.org/issue9845  opened by tarek

#9846: ZipExtFile provides no mechanism for closing the underlying fi
http://bugs.python.org/issue9846  opened by john.admanski

#9849: Argparse needs better error handling for nargs
http://bugs.python.org/issue9849  opened by Jason.Baker

#9850: obsolete macpath module dangerously broken and should be remov
http://bugs.python.org/issue9850  opened by ned.deily

#9851: multiprocessing socket timeout will break client
http://bugs.python.org/issue9851  opened by hume

#9852: test_ctypes fail with clang
http://bugs.python.org/issue9852  opened by cartman

#9854: SocketIO should return None on EWOULDBLOCK
http://bugs.python.org/issue9854  opened by pitrou

#9856: Change object.__format__(s) where s is non-empty to a Deprecat
http://bugs.python.org/issue9856  opened by eric.smith

#9857: SkipTest in tearDown is reported an as an error
http://bugs.python.org/issue9857  opened by pitrou

#9858: Python and C implementations of io are out of sync
http://bugs.python.org/issue9858  opened by pitrou

#9859: Add tests to verify API match of modules with 2 implementation
http://bugs.python.org/issue9859  opened by stutzbach

#9860: Building python outside of source directory fails
http://bugs.python.org/issue9860  opened by belopolsky

#9861: subprocess module changed exposed attributes
http://bugs.python.org/issue9861  opened by pclinch

#9862: test_subprocess hangs on AIX
http://bugs.python.org/issue9862  opened by sable

#9864: email.utils.{parsedate,parsedate_tz} should have better return
http://bugs.python.org/issue9864  opened by pitrou

#9865: OrderedDict doesn't implement __sizeof__
http://bugs.python.org/issue9865  opened by pitrou

#9866: Inconsistencies in tracing list comprehensions
http://bugs.python.org/issue9866  opened by belopolsky

#9867: Interrupted system calls are not retried
http://bugs.python.org/issue9867  opened by aronacher

#9868: test_locale leaves locale changed
http://bugs.python.org/issue9868  opened by ocean-city

#9869: long_subtype_new segfault in pure-Python code
http://bugs.python.org/issue9869  opened by cwitty

#9871: IDLE dies when using some regex
http://bugs.python.org/issue9871  opened by Popa.Claudiu

#9873: Allow bytes in some APIs that use string literals internally
http://bugs.python.org/issue9873  opened by ncoghlan

#9874: Message.attach() loses empty attachments
http://bugs.python.org/issue9874  opened by [email protected]

#9875: Garbage output when running setup.py on Windows
http://bugs.python.org/issue9875  opened by exarkun

#9876: ConfigParser can't interpolate values from other sections
http://bugs.python.org/issue9876  opened by asolovyov

#9877: Expose sysconfig._get_makefile_filename() in public API
http://bugs.python.org/issue9877  opened by barry

#9878: Avoid parsing pyconfig.h and Makefile by autogenerating extens
http://bugs.python.org/issue9878  opened by barry

#9880: Python 2.7 Won't Build: SystemError: unknown opcode
http://bugs.python.org/issue9880  opened by Tom.Browder

#9882: abspath from directory
http://bugs.python.org/issue9882  opened by ipatrol

#9883: minidom: AttributeError: DocumentFragment instance has no attr
http://bugs.python.org/issue9883  opened by Aubrey.Barnard

#9884: The 4th parameter of method always None or 0 on x64 Windows.
http://bugs.python.org/issue9884  opened by J2.NETe

#9886: Make operator.itemgetter/attrgetter/methodcaller easier to dis
http://bugs.python.org/issue9886  opened by ncoghlan

#9887: distutil's build_scripts doesn't read utf-8 in all locales
http://bugs.python.org/issue9887  opened by hagen

#460474: codecs.StreamWriter: reset() on close()
http://bugs.python.org/issue460474  reopened by r.david.murray

#767645: incorrect os.path.supports_unicode_filenames
http://bugs.python.org/issue767645  reopened by haypo

#1076515: shutil.move clobbers read-only files.
http://bugs.python.org/issue1076515  reopened by brian.curtin



M

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-17 Thread Michael Foord

 On 16/09/2010 23:05, Antoine Pitrou wrote:

On Thu, 16 Sep 2010 16:51:58 -0400
"R. David Murray"  wrote:

What do we store in the model?  We could say that the model is always
text.  But then we lose information about the original bytes message,
and we can't reproduce it.  For various reasons (mailman being a big one),
this is not acceptable.  So we could say that the model is always bytes.
But we want access to (for example) the header values as text, so header
lookup should take string keys and return string values[2].

Why can't you have both in a single class? If you create the class
using a bytes source (a raw message sent by SMTP, for example), the
class automatically parses and decodes it to unicode strings; if you
create the class using an unicode source (the text body of the e-mail
message and the list of recipients, for example), the class
automatically creates the bytes representation.

I think something like this would be great for WSGI. Rather than focus 
on whether bytes *or* text should be used, use a higher level object 
that provides a bytes view, and (where possible/appropriate) a unicode 
view too.


Michael



(of course all processing can be done lazily for performance reasons)


What about email files on disk?  They could be bytes, or they could be,
effectively, text (for example, utf-8 encoded).

Such a file can be two things:
- the raw encoding of a whole message (including headers, etc.), then
   it should be fed as a bytes object
- the single text body of a hypothetical message, then it should be fed
   as a unicode object

I don't see any possible middle-ground.


On disk, using utf-8,
one might store the text representation of the message, rather than
the wire-format (ASCII encoded) version.  We might want to write such
messages from scratch.

But then the user knows the encoding (by "user" I mean what/whoever
calls the email API) and mentions it to the email package.

What I'm having an issue with is that you are talking about a bytes
representation and an unicode representation of a message. But they
aren't representations of the same things:
- if it's a bytes representation, it will be the whole, raw message
   including envelope / headers (also, MIME sections etc.)
- if it's an unicode representation, it will only be a section of the
   message decodable as such (a text/plain MIME section, for example;
   or a decoded header value; or even a single e-mail address part of a
   decoded header)

So, there doesn't seem to be any reason for having both a BytesMessage
and an UnicodeMessage at the same abstraction level. They are both
representing different things at different abstraction levels. I don't
see any potential for confusion: raw assembled e-mail message = bytes;
decoded text section of a message = unicode.

As for the problem of potential "bogus" raw e-mail data
(e.g., undecodable headers), well, I guess the library has to make a
choice between purity and practicality, or perhaps let the user choose
themselves. For example, through a `strict` flag. If `strict` is true,
raise an error as soon as a non-decodable byte appears in a header, if
`strict` is false, decode it through a default (encoding, errors)
convention which can be overriden by the user (a sensible possibility
being "utf-8, surrogateescape" to allow for lossless round-tripping).


As I said above, we could insist that files on
disk be in wire-format, and for many applications that would work fine,
but I think people would get mad at us if didn't support text files[3].

Again, this simply seems to be two different abstraction levels:
pre-generated raw email messages including headers, or a single text
waiting to be embedded in an actual e-mail.


Anyway, what polymorphism means in email is that if you put in bytes,
you get a BytesMessage, if you put in strings you get a StringMessage,
and if you want the other one you convert.

And then you have two separate worlds while ultimately the same
concepts are underlying. A library accepting BytesMessage will crash
when a program wants to give a StringMessage and vice-versa. That
doesn't sound very practical.


[1] Now that surrogateesscape exists, one might suppose that strings
could be used as an 8bit channel, but that only works if you don't need
to *parse* the non-ASCII data, just transmit it.

Well, you can parse it, precisely. Not only, but it round-trips if you
unparse it again:


header_bytes = b"From: bogus\xFFname"
name, value = header_bytes.decode("utf-8", "surrogateescape").split(":")
name

'From'

value

' bogus\udcffname'

"{0}:{1}".format(name, value).encode("utf-8", "surrogateescape")

b'From: bogus\xffname'


In the end, what I would call a polymorphic best practice is "try to
avoid bytes/str polymorphism if your domain is well-defined
enough" (which I admit URLs aren't necessarily; but there's no
question a single text/XXX e-mail section is text, and a whole
assembled e-mail message is bytes).

Regards

Antoine.
__

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-17 Thread Ian Bicking
On Fri, Sep 17, 2010 at 3:25 PM, Michael Foord wrote:

>  On 16/09/2010 23:05, Antoine Pitrou wrote:
>
>> On Thu, 16 Sep 2010 16:51:58 -0400
>> "R. David Murray"  wrote:
>>
>>> What do we store in the model?  We could say that the model is always
>>> text.  But then we lose information about the original bytes message,
>>> and we can't reproduce it.  For various reasons (mailman being a big
>>> one),
>>> this is not acceptable.  So we could say that the model is always bytes.
>>> But we want access to (for example) the header values as text, so header
>>> lookup should take string keys and return string values[2].
>>>
>> Why can't you have both in a single class? If you create the class
>> using a bytes source (a raw message sent by SMTP, for example), the
>> class automatically parses and decodes it to unicode strings; if you
>> create the class using an unicode source (the text body of the e-mail
>> message and the list of recipients, for example), the class
>> automatically creates the bytes representation.
>>
>>  I think something like this would be great for WSGI. Rather than focus on
> whether bytes *or* text should be used, use a higher level object that
> provides a bytes view, and (where possible/appropriate) a unicode view too.
>

This is what WebOb does; e.g., there is only bytes version of a POST body,
and a view on that body that does decoding and encoding.  If you don't touch
something, it is never decoded or encoded.  I only vaguely understand the
specifics here, and I suspect the specifics matter, but this seems
applicable in this case too -- if you have an incoming email with a
smattering of bytes, inline (2047) encoding, other encoding declarations,
and then orthogonal systems like quoted-printable, you don't want to touch
that stuff if you don't need to as handling unicode objects implies you are
normalizing the content, and that might have subtle impacts you don't know
about, or don't want to know about, or maybe just don't fit into the unicode
model (like a string with two character sets).

Note that WebOb does not have two views, it has only one view -- unicode
viewing bytes.  I'm not sure I could keep two views straight.  I *think*
Antoine is describing two possible canonical data types (unicode or bytes)
and two views.  That sounds hard.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Catalog-sig] egg_info in PyPI

2010-09-17 Thread Tarek Ziadé
On Fri, Sep 17, 2010 at 10:02 PM, Jannis Leidel  wrote:
> On 17.09.2010, at 20:43, Martin v. Löwis wrote:
>
>> Here at the DZUG conference, we are planning to integrate explicit access to 
>> setuptools metadata to the package index.
>>
>> The current idea is to store the contents of the egg_info directory,
>> giving remote access to specific files. By default, PyPI will extract,
>> per release, data from the egg that may get uploaded (using the first
>> one if multiple eggs get uploaded). If no egg gets uploaded, a VM
>> based build service will generate it from a source distributions.
>> Tools like setuptools or distribute could also directly upload this
>> information, e.g. as part of the register command.
>>
>> Any opinions?
>
> I'm confused, wouldn't that basically be a slap in the face for the people 
> having worked on PEP345 and distutils2, especially during the Summer of Code?
>
> Also, and I understand enthusiasm tends to build up during conferences, but 
> wouldn't supporting setuptools' egg-info directory again be a step backwards 
> after all those months of discussion about the direction of Python packaging?

Yeah, we worked on a new standard that was accepted - PEP 345

PyPI is currently publishing pep 345 info as a matter of fact - I did
the patch and there's one package that already uses it
http://pypi.python.org/pypi/Distutils2.  (no deps on this one, but
other stuff like links..)

I am about to release the work we did during GSOC in distutils2, a
first beta that includes all the work we done.

Now you want to publish another metadata format at PyPI ? If PyPI
takes that direction and adopts, promotes and publishes a standard
that is not the one we worked on in the past year, this will increase
our difficulty to push the new format so its adopted by the tools then
the community.

People will just get confused because they will find two competing
metadata formats  That's exactly the situation where we were at, and
that's exactly where I don't want to go back.

I am not even understanding what's the benefit of doing this since an
egg_info directory is obtained at *build* time and can differ from a
machine to another, so it seems pretty useless for me to publish this.

The whole point of PEP 345 is to extend our metadata to statically
provide dependencies at PyPI, thanks to a micro-language that allows
you to describe dependencies for any platform.

We worked hard to build some standards, but if PyPI doesn't help us
here, everything we did and are doing is useless.

Tarek

-- 
Tarek Ziadé | http://ziade.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Some news from my sandbox project

2010-09-17 Thread Victor Stinner
Hi,

I'm still developing irregulary my sandbox project since last june. pysandbox 
is a sandbox to execute untrusted Python code. It is able to execute unmodified 
Python code with a low overhead. I consider it as stable and secure.
http://github.com/haypo/pysandbox/

Today, the biggest problem is the creation of a read only view of the 
__builtins__ dictionary. I tried to create my own object with the dict API, 
but I got quickly a segfault. I realized that ceval.c is hardcoded to use 
PyDict functions on __builtins__ (LOAD_GLOBAL instruction). So I created a 
subclass of dict and replaced modify function (__setitem__, update, clear, 
...).

I would like to know if you will agree to modify ceval.c (and maybe some other 
functions) to support __builtins__ of another type than dict. I mean add a 
fast check (PyDict_CheckExact) on the type. If you agree, I can will an issue 
with a patch.

The two last vulnerabilities came from this problem: it was possible to use 
dict methods on __builtins__, eg. dict.update(__builtins__, {...}) and  
dict.__init__(__builtins__, {...}). Because of that, pysandbox removes all 
dict methods able to modify a dict. And so "d={...}; d.update(...)" raises an 
error (d has no update attribute) :-/

---

If you would like to test pysandbox, just join ##fschfsch channel of the 
Freenode IRC server and talk to fschfsch. It's an IRC bot using pysandbox to 
evaluate Python expressions. It is also on #python-fr and #python channels, 
but please use ##fschfsch for tests.
http://github.com/haypo/pysandbox/wiki/fschfsch

Or you can pysandbox on your computer. Download the last git version (github 
provides tarballs if you don't have git program), install it and run: python 
interpreter.py. You have to compile _sandbox, a C module required to modify 
some Python internals.

The last git version is compatible with Python 2.5, 2.6 and 2.7. It works on 
3.1 and 3.2 after a conversion with 2to3 and a small hack on sandbox/proxy.py: 
replace "elif isinstance(value, OBJECT_TYPES):" by "else:" (and remove the 
existing else statement). I'm not sure that this hack is safe, and so I didn't 
commited it yet.

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Catalog-sig] egg_info in PyPI

2010-09-17 Thread Sridhar Ratnakumar

On 2010-09-17, at 4:04 PM, Tarek Ziadé wrote:

> I am not even understanding what's the benefit of doing this since an
> egg_info directory is obtained at *build* time and can differ from a
> machine to another, so it seems pretty useless for me to publish this.

I am in full agreement with Tarek here. At ActiveState, we maintain our own 
index that differs from PyPI in two ways (among others):

- use setuptools.package_index to scrap sdists for packages that don't upload 
them to pypi
- PKG-INFO and requires.txt are extracted (if doesn't exist, generate using 
egg_info cmd) from each of the sdist

(and then our index provides the full metadata - with internal links to sdists 
- as a sqlite db for the builder processes on each platform)

The problem with extracting PKG-INFO and requires.txt on the index server is 
that, the contents of requires.txt sometimes differ based on which platform and 
Python version on which the egg_info command was run. For eg., the "tox" 
project depends[1] on "virtualenv" package if it is run using Python2, but not 
on Python3.

> The whole point of PEP 345 is to extend our metadata to statically
> provide dependencies at PyPI, thanks to a micro-language that allows
> you to describe dependencies for any platform.

Static metadata would allow packages like "tox" to configure Python version / 
platform specific dependencies without resorting to runtime checks. 

> We worked hard to build some standards, but if PyPI doesn't help us
> here, everything we did and are doing is useless.

Ideally, in future - I should be able to query static metadata (with 
environment markers[2] and such) for *any* package from PyPI. And this static 
metadata is simply a DIST-INFO file (instead of being a directory with a bunch 
of files in it). I don't really see a point in providing access to, say, the 
list of entry points of each package. As for as package managers is concerned, 
the only things that matter are a) list of package names and versions, b) 
source tarball for each release c) and the corresponding metadata with 
dependency information.

-srid

[1] http://code.google.com/p/pytox/source/browse/setup.py#30
[2] http://www.python.org/dev/peps/pep-0345/#environment-markers
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com