date:20100621

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Stephen J. Turnbull

Robert Collins writes:

 > Also, url's are bytestrings - by definition;

Eh?  RFC 3896 explicitly says

A URI is an identifier consisting of a sequence of characters
matching the syntax rule named  in Section 3.

(where the phrase "sequence of characters" appears in all ancestors I
found back to RFC 1738), and

2.  Characters

The URI syntax provides a method of encoding data, presumably for
the sake of identifying a resource, as a sequence of characters.
The URI characters are, in turn, frequently encoded as octets for
transport or presentation.  This specification does not mandate any
particular character encoding for mapping between URI characters
and the octets used to store or transmit those characters.  When a
URI appears in a protocol element, the character encoding is
defined by that protocol; without such a definition, a URI is
assumed to be in the same character encoding as the surrounding
text.

 > if the standard library has made them unicode objects in 3, I
 > expect a lot of pain in the webserver space.

Yup.  But pain is inevitable if people are treating URIs (whether URLs
or otherwise) as octet sequences.  Then your base URL is gonna be
b'mailto:[email protected]', but the natural thing the UI will want
to do is 

formurl = baseurl + '?subject=うるさいやつだなぁ…'

IMO, the UI is right.  "Something" like the above "ought" to work.

So the function that actually handles composing the URL should take a
string (ie, unicode), and do all escaping.  The UI code should not
need to know about escaping.  If nothing escapes except the function
that puts the URL in composed form, and that function always escapes,
life is easy.

Of course, in real life it's not that easy.  But it's possible to make
things unnecessarily hard for the users of your URI API(s), and one
way to do that is to make URIs into "just bytes" (and "just unicode"
is probably nearly as bad, except that at least you know it's not
ready for the wire).

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Lennart Regebro

2010/6/21 Stephen J. Turnbull :
> IMO, the UI is right.  "Something" like the above "ought" to work.

Right. That said, many times when you want to do urlparse etc they
might be binary, and you might want binary. So maybe the methods
should work with both?

-- 
Lennart Regebro: http://regebro.wordpress.com/
Python 3 Porting: http://python3porting.com/
+33 661 58 14 64
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Nick Coghlan

On Mon, Jun 21, 2010 at 11:58 AM, P.J. Eby  wrote:
> At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote:
>>
>> Perhaps if people could identify which specific string methods are
>> causing problems?
>
> __getitem__(int) returns an integer rather than a bytestring, so anything
> that manipulates individual characters can't be given bytes and have it
> work.

It can if you use length one slices rather than simple indexing.
Depending on the details, such algorithms may still fail for
multi-byte codecs though.

> That was one of the key differences I had in mind for a bstr type, apart
> from  designing it to coerce normal strings to bstrs in cross-type
> operations, and to allow O(1) "conversion" to/from bytes.

Erk, that just sounds like a recipe for recreating the problems 2.x
has in a new form.

> Another randomly chosen byte/string incompatibility (Python 3.1; I don't
> have 3.2 handy at the moment):
>
 os.path.join(b'x','y')
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "c:\Python31\lib\ntpath.py", line 161, in join
>    if b[:1] in seps:
> TypeError: Type str doesn't support the buffer API
>
 os.path.join('x',b'y')
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "c:\Python31\lib\ntpath.py", line 161, in join
>    if b[:1] in seps:
> TypeError: 'in ' requires string as left operand, not bytes
>
> Ironically, it seems to me that in trying to make the type distinction more
> rigid, Py3K fails in this area precisely because it is not a rigidly typed
> language in the Java or Haskell sense: i.e., os.path.join doesn't say, "I
> need two stringlike objects of the *same type*", not even in its docstring.

I believe it actually needs the objects to be compatible with the type
of os.sep, rather than just with each other (i.e. the type
restrictions on os.path.join are the same as those on os.sep.join,
even though the join algorithm itself is slightly different). This
restriction should be mentioned in the Py3k docstring and docs for
os.path.join - if it isn't, that would be a doc bug.

> At least in Java, you would either implement a "path" type with coercions
> from bytes and strings, or you'd have a class with overloaded methods for
> handling join operations on bytes and strings, respectively, thereby
> avoiding this whole mess.
>
> (Alas, this little example on the 'in' operator also shows that my bstr
> effort would probably fail anyway, because there's no '__rcontains__'
> (__lcontains__?) to allow it to override the str type's __contains__.)

OK, these examples convince me that the incompatibility problem is
real. However, I don't think a bstr type can solve them even without
the __rcontains__ problem - it would just recreate the pain that we
already have in the 2.x world.

Something that may make sense to ease the porting process is for some
of these "on the boundary" I/O related string manipulation functions
(such as os.path.join) to grow "encoding" keyword-only arguments. The
recommended approach would be to provide all strings, but bytes could
also be accepted if an encoding was specified. (If you want to mix
encodings - tough, do the decoding yourself).

For the idea of avoiding excess copying of bytes through multiple
encoding/decoding calls... isn't that meant to be handled at an
architectural level (i.e. decode once on the way in, encode once on
the way out)? Optimising the single-byte codec case by minimising data
copying (possibly through creative use of PEP 3118) may be something
that we want to look at eventually, but it strikes me as something of
a premature optimisation at this point in time (i.e. the old adage
"first get it working, then get it working fast").

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Nick Coghlan

On Mon, Jun 21, 2010 at 9:06 AM, Laurens Van Houtven  wrote:
> Okay cool, we fixed it: http://python-commandments.org/python3.html
>
> People are otherwise happy with the text?

Yep, looks pretty good to me.

I hope you don't mind, but I actually borrowed your text to seed a
corresponding page on the Python wiki:
http://wiki.python.org/moin/Python2orPython3

It turns out the beginner's guide on the wiki doesn't even acknowledge
the possibility of downloading Python 3.1 rather than 2.6 to start
experimenting with Python.

The Wiki is probably a good place for this kind of material, anyway -
it makes it much easier for people to update as they identify major
third party libraries that do and don't have Py3k compatible versions
(and, some day, Python2 compatible versions).

Cheers,
Nick.

P.S. (We're going to have a tough decision to make somewhere along the
line where docs.python.org is concerned, too - when do we flick the
switch and make a 3.x version of the docs the default? We probably
won't need to seriously consider that question until the 3.3. time
frame though).

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Nick Coghlan

On Mon, Jun 21, 2010 at 12:30 PM, P.J. Eby  wrote:
> I also find it weird that there seem to be two camps on this subject, one of
> which claims that All Is Well And There Is No Problem -- but I do not recall
> seeing anyone who was in the "What do I do; this doesn't seem ready" camp
> who switched sides and took the time to write down what made them realize
> that they were wrong about there being a problem, and what steps they had to
> take.  The existence of one or more such documents would certainly ease my
> mind, and I imagine that of other people who are less waiting for others'
> libraries, than for the stdlib (and/or language) itself to settle.
>
> (Or more precisely, for it to be SEEN to have settled.)

I don't know that the "all is well" camp actually exists. The camp
that I do see existing is the one that says "without a bug report,
inconsistencies in the standard library's unicode handling won't get
fixed".

The issues picked up by the regression test suite have already been
dealt with, but that suite is unfortunately far from comprehensive.
Just like a lot of Python code that is out there, the standard library
isn't immune to the poor coding practices that were permitted by the
blurry lines between text and octet streams in 2.x.

It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need to grow "encoding" keyword arguments that they then pass on to
the functions they call or use to convert str arguments to bytes (or
vice-versa). But without people trying to port affected libraries and
reporting bugs when they find issues, the situation isn't going to
improve.

Now, if these bugs are already being reported against 3.1 and just
aren't getting fixed, that's a completely different story...

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] [OT] carping about irritating people (was: bytes / unicode)

2010-06-21 Thread Ben Finney

"Stephen J. Turnbull"  writes:

> your base URL is gonna be b'mailto:[email protected]', but the
> natural thing the UI will want to do is
>
> formurl = baseurl + '?subject=うるさいやつだなぁ…'

Incidentally, which irritating person was the topic of this
Japanese-language message to you?

(The subject in Stephen's example message translates roughly as
“(unspecified third person) is an irritating rascal, don't you agree”.)

-- 
 \ “The userbase for strong cryptography declines by half with |
  `\  every additional keystroke or mouseclick required to make it |
_o__) work.” —Carl Ellison |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Arc Riley

I would suggest that if packages that do not have Python 3 support yet are
listed, then their alternatives should also.

PyQt has had Py3 support for some time.
PostgreSQL and SQLite do (as does SQLAlchemy)
CherryPy has had Py3 support for the last release cycle
libxml2 does not, but lxml does.

Also, under where it mentions that most OS's do not include Python 3, it
should be noted which have good support for it.  Gentoo (for example) has
excellent support for Python 3, automatically installing Python packages
which have Py3 support for both Py2 and Py3, and the python-based Portage
package system runs cleanly on Py2.6, Py3.1 and Py3.2.

Give credit where credit is due. :-)

On Mon, Jun 21, 2010 at 8:33 AM, Nick Coghlan  wrote:

> On Mon, Jun 21, 2010 at 9:06 AM, Laurens Van Houtven 
> wrote:
> > Okay cool, we fixed it: http://python-commandments.org/python3.html
> >
> > People are otherwise happy with the text?
>
> Yep, looks pretty good to me.
>
> I hope you don't mind, but I actually borrowed your text to seed a
> corresponding page on the Python wiki:
> http://wiki.python.org/moin/Python2orPython3
>
> It turns out the beginner's guide on the wiki doesn't even acknowledge
> the possibility of downloading Python 3.1 rather than 2.6 to start
> experimenting with Python.
>
> The Wiki is probably a good place for this kind of material, anyway -
> it makes it much easier for people to update as they identify major
> third party libraries that do and don't have Py3k compatible versions
> (and, some day, Python2 compatible versions).
>
> Cheers,
> Nick.
>
> P.S. (We're going to have a tough decision to make somewhere along the
> line where docs.python.org is concerned, too - when do we flick the
> switch and make a 3.x version of the docs the default? We probably
> won't need to seriously consider that question until the 3.3. time
> frame though).
>
> --
> Nick Coghlan   |   [email protected]   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/arcriley%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 09:37 AM, Arc Riley wrote:

>Also, under where it mentions that most OS's do not include Python 3, it
>should be noted which have good support for it.  Gentoo (for example) has
>excellent support for Python 3, automatically installing Python packages
>which have Py3 support for both Py2 and Py3, and the python-based Portage
>package system runs cleanly on Py2.6, Py3.1 and Py3.2.

We're trying to get there for Ubuntu (driven also by Debian).  We have Python
3.1.2 in main for Lucid, though we will probably not get 3.2 into Maverick
(the October 2010 release).  We're currently concentrating on Python 2.7 as a
supported version because it'll be released by then, while 3.2 will still be
in beta.

If you want to help, or have complaints, kudos, suggestions, etc. for Python
support on Ubuntu, you can contact me off-list.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Nick Coghlan

On Mon, Jun 21, 2010 at 11:37 PM, Arc Riley  wrote:
> I would suggest that if packages that do not have Python 3 support yet are
> listed, then their alternatives should also.
>
> PyQt has had Py3 support for some time.
> PostgreSQL and SQLite do (as does SQLAlchemy)
> CherryPy has had Py3 support for the last release cycle
> libxml2 does not, but lxml does.
>
> Also, under where it mentions that most OS's do not include Python 3, it
> should be noted which have good support for it.  Gentoo (for example) has
> excellent support for Python 3, automatically installing Python packages
> which have Py3 support for both Py2 and Py3, and the python-based Portage
> package system runs cleanly on Py2.6, Py3.1 and Py3.2.
>
> Give credit where credit is due. :-)

A decent listing of major packages that already support Python 3 would
be very handy for the new Python2orPython3 page I created on the wiki,
and easier to keep up-to-date. (the old Early2to3Migrations page
didn't look particularly up to date, but hopefully we can keep the new
list in a happier state).

It just ticked past midnight for me, so I'm off to bed, but for anyone
with a wiki account, have at it:
http://wiki.python.org/moin/Python2orPython3

(Updating the beginner's guide to recognise Python 3 as a valid option
would also be helpful: http://wiki.python.org/moin/BeginnersGuide)

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [OT] carping about irritating people (was: bytes / unicode)

2010-06-21 Thread Nick Coghlan

On Mon, Jun 21, 2010 at 11:17 PM, Ben Finney  wrote:
> "Stephen J. Turnbull"  writes:
>
>> your base URL is gonna be b'mailto:[email protected]', but the
>> natural thing the UI will want to do is
>>
>> formurl = baseurl + '?subject=うるさいやつだなぁ…'
>
> Incidentally, which irritating person was the topic of this
> Japanese-language message to you?
>
> (The subject in Stephen's example message translates roughly as
> “(unspecified third person) is an irritating rascal, don't you agree”.)

Given what he said about the base URL, it would appear to be a
self-deprecating self-description. Nicely done :)

(I can pronounce that subject line, but I didn't know what it meant
without the translation).

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [OT] carping about irritating people (was: bytes / unicode)

2010-06-21 Thread Nick Coghlan

> Given what he said about the base URL, it would appear to be a
> self-deprecating self-description. Nicely done :)

Gah, no it isn't, you're right, the message leaves it unspecified. OK,
no more posting after midnight for me... (well, not tonight, anyway)

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby


At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote:

For the idea of avoiding excess copying of bytes through multiple
encoding/decoding calls... isn't that meant to be handled at an
architectural level (i.e. decode once on the way in, encode once on
the way out)? Optimising the single-byte codec case by minimising data
copying (possibly through creative use of PEP 3118) may be something
that we want to look at eventually, but it strikes me as something of
a premature optimisation at this point in time (i.e. the old adage
"first get it working, then get it working fast").


The issue is, I'd like to have an idempotent incantation that I can 
use to make the inputs and outputs to stdlib functions behave in a 
type-safe manner with respect to bytes, in cases where bytes are 
really what I want operated on.


Note too that this is an argument for symmetry in wrapping the inputs 
and outputs, so that the code doesn't have to "know" what it's dealing with!


After all, right now, if a stdlib function might return bytes or 
unicode depending on runtime conditions, I can't even hardcode an 
.encode() call -- it would fail if the return type is a bytes.


This basically goes against the "tell, don't ask" pattern, and the 
Pythonically idempotent approach.  That is, Python builtins normally 
return you back the same thing if it's already what you want - 
int(someInt)-> someInt, iter(someIter)->someIter, etc.


Since this incantation may need to be used often, and in places that 
are not known to me in advance, I would like it to not impose new 
overhead in unexpected places.  (i.e., the usual argument brought 
against making changes to the 'list' type that would change certain 
operations from O(1) to O(log something)).


It's more about predictability, and having One *Obvious* Way To Do 
It, as opposed to "several ways, which you need to think carefully 
about and restructure your entire architecture around if 
necessary".  One obvious way means I can focus on the mechanical 
effort of porting *first*, without having to think.


So, the performance issue isn't really about performance *per se*, so 
much as about the "mental UI" of the language.  You could just as 
easily lie and tell me that your bstr implementation is O(1), and I 
would probably be happy and never notice, because the issue was never 
really about performance as such, but about having to *think* about 
it.  (i.e., breaking flow.)


Really, the entire issue can presumably be dealt with by some series 
of incantations - it's just code after all.  But having to sit and 
think about *every* situation where I'm dealing with bytes/unicode 
distinctions seems like a torture compared to being able to say, 
"okay, so when dealing with this sort of API and this sort of data, 
this is the One Obvious Way to do the conversions."


It's One Obvious Way that I want, but some people seem to be arguing 
that the One Obvious Way is to Think Carefully About It Every Time -- 
and that seems to violate the "Obvious" part, IMO.  ;-)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Toshio Kuratomi

On Mon, Jun 21, 2010 at 09:57:30AM -0400, Barry Warsaw wrote:
> On Jun 21, 2010, at 09:37 AM, Arc Riley wrote:
> 
> >Also, under where it mentions that most OS's do not include Python 3, it
> >should be noted which have good support for it.  Gentoo (for example) has
> >excellent support for Python 3, automatically installing Python packages
> >which have Py3 support for both Py2 and Py3, and the python-based Portage
> >package system runs cleanly on Py2.6, Py3.1 and Py3.2.
> 
> We're trying to get there for Ubuntu (driven also by Debian).  We have Python
> 3.1.2 in main for Lucid, though we will probably not get 3.2 into Maverick
> (the October 2010 release).  We're currently concentrating on Python 2.7 as a
> supported version because it'll be released by then, while 3.2 will still be
> in beta.
> 
> If you want to help, or have complaints, kudos, suggestions, etc. for Python
> support on Ubuntu, you can contact me off-list.
> 
 Fedora 14 is about the same.  A nice to have thing that goes along
with these would be a table that has packages ported to python3 and which
distributions have the python3 version of the package.

Once most of the important third party packages are ported to python3 and in
the distributions, this table will likely become out-dated and probably
should be reaped but right now it's a very useful thing to see.

-Toshio


pgp4ovCkaMeKl.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Arc Riley

Personally, I'd like to celebrate the upcoming Python 3.2 release (which
will hopefully include 3to2) with moving all packages which do not have the
'Programming Language :: Python :: 3' classifier to a "Legacy" section of
PyPI and offer only Python 3 packages otherwise.  Of course put a banner at
the top clearly explaining that Python 2 packages can be found in the Legacy
section.

Radical, I know, but at some point we really need to make this move.

PyPI really needs a mechanism to cull out the moribund packages from being
displayed next to the actively maintained ones.  There's so many packages on
there that only work on Python 2.2-2.4 (for example), or with a specific
highly outdated version of another package, etc.

On Mon, Jun 21, 2010 at 11:13 AM, Stephan Richter  wrote:

> On Monday, June 21, 2010, Nick Coghlan wrote:
> > A decent listing of major packages that already support Python 3 would
> > be very handy for the new Python2orPython3 page I created on the wiki,
> > and easier to keep up-to-date. (the old Early2to3Migrations page
> > didn't look particularly up to date, but hopefully we can keep the new
> > list in a happier state).
>
> I really just want to be able to go to PyPI, Click on "Browse packages" and
> then select "Python 3" (it can currently be accomplished by clicking
> "Python"
> and then  "3"). Of course, package developers need to be encouraged to add
> these Trove classifiers so that the listings are as complete as possible.
>
> Regards,
> Stephan
> --
> Entrepreneur and Software Geek
> Google me. "Zope Stephan Richter"
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Laurens Van Houtven

On Mon, Jun 21, 2010 at 3:37 PM, Arc Riley  wrote:
> I would suggest that if packages that do not have Python 3 support yet are
> listed, then their alternatives should also.

Okay, this is being worked on.

> PyQt has had Py3 support for some time.

Added, as well as PySide.

> PostgreSQL and SQLite do (as does SQLAlchemy)

wrt Postgres: Is that psycopg2? Not sure what that's an alternative
to, since the 2.x list doesn't have any ORMs or database APIs at the
moment (unless Django counts).

> CherryPy has had Py3 support for the last release cycle

Okay, going to add it but can't right now because lots of people are editing.

> libxml2 does not, but lxml does.

That's okay, I don't think many people seriously use python-libxml2
anyway (using lxml instead) :-) Again, not sure what that would be an
alternative for though?

> Also, under where it mentions that most OS's do not include Python 3, it
> should be noted which have good support for it.  Gentoo (for example) has
> excellent support for Python 3, automatically installing Python packages
> which have Py3 support for both Py2 and Py3, and the python-based Portage
> package system runs cleanly on Py2.6, Py3.1 and Py3.2.

As Barry has pointed out 3.x is in many distros now, so in order to
not make people angry that their distro who also does the Right Thing
isn't mentioned (what's Arch do? py3k is easily available from AUR,
that's not really ArchLinux proper but every Arch user I've ever
talked to considers AUR an integral part), I added this:
"""
Also, quite a few distributions have Python 3.x available already for
end-users, even if it's not the default interpreter.
"""
I think that would make everyone happy, and the wiki article that much
more maintainable.

Thanks for your input,
Laurens
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Laurens Van Houtven

On Mon, Jun 21, 2010 at 5:28 PM, Toshio Kuratomi  wrote:
>  Fedora 14 is about the same.  A nice to have thing that goes along
> with these would be a table that has packages ported to python3 and which
> distributions have the python3 version of the package.

Yeah, this is exactly why I'd prefer to not have to maintain a
specific list. Big distros are making Python 3.x available, it's not
the default interpreter yet anywhere (AFAIK?), but that's going to
happen in the next few releases of said distributions.

On Mon, Jun 21, 2010 at 5:31 PM, Arc Riley  wrote:
> Personally, I'd like to celebrate the upcoming Python 3.2 release (which
> will hopefully include 3to2) with moving all packages which do not have the
> 'Programming Language :: Python :: 3' classifier to a "Legacy" section of
> PyPI and offer only Python 3 packages otherwise.  Of course put a banner at
> the top clearly explaining that Python 2 packages can be found in the Legacy
> section.
>
> Radical, I know, but at some point we really need to make this move.

I agree we have to make it at some point but I feel this is way, way too early.

thanks for your continued input,
Laurens
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:

>Something that may make sense to ease the porting process is for some
>of these "on the boundary" I/O related string manipulation functions
>(such as os.path.join) to grow "encoding" keyword-only arguments. The
>recommended approach would be to provide all strings, but bytes could
>also be accepted if an encoding was specified. (If you want to mix
>encodings - tough, do the decoding yourself).

This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have "encoding-carrying" bytes and str types?
Basically, I'm thinking of types (maybe even the current ones) that carry
around a .encoding attribute so that they can be automatically encoded and
decoded where necessary.  This at least would simplify APIs that need to do
the conversion.

By default, the .encoding attribute would be some marker to indicated "I have
no idea, do it explicitly" and if you combine ebytes or estrs that have
incompatible encodings, you'd either throw an exception or reset the .encoding
to IAmConfuzzled.  But say you had an email header like:

=?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?=

And code like the following (made less crappy):

-snip snip-
class ebytes(bytes):
encoding = 'ascii'

def __str__(self):
s = estr(self.decode(self.encoding))
s.encoding = self.encoding
return s

class estr(str):
encoding = 'ascii'

s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 
'euc-jp')
b = bytes(s, 'euc-jp')

eb = ebytes(b)
eb.encoding = 'euc-jp'
es = str(eb)
print(repr(eb), es, es.encoding)
-snip snip-

Running this you get:

b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド！ 
euc-jp

Would it be feasible?  Dunno.  Would it help ease the bytes/str confusion?
Dunno.  But I think it would help make APIs easier to design and use because
it would cut down on the encoding-keyword function signature infection.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Michael Urman

On Mon, Jun 21, 2010 at 09:51, P.J. Eby  wrote:
> The issue is, I'd like to have an idempotent incantation that I can use to
> make the inputs and outputs to stdlib functions behave in a type-safe manner
> with respect to bytes, in cases where bytes are really what I want operated
> on.
>
> Note too that this is an argument for symmetry in wrapping the inputs and
> outputs, so that the code doesn't have to "know" what it's dealing with!

It is somewhat troublesome that there doesn't appear to be an obvious
built-in idempotent-when-possible function that gives back the
provided bytes/str, or converts to the requested type per the listed
encoding (as of 3.1.2). Would it be useful to make the second versions
of these work, or would that return us to the confusion of the 2.x
era? On the other hand, since these are all TypeErrors instead of
UnicodeErrors, it's an easy wrapper to write.

>>> bytes('abc', 'latin-1')
b'abc'
>>> bytes(b'abc', 'latin-1')
TypeError: encoding or errors without a string argument

>>> str(b'abc', 'latin-1')
'abc'
>>> str('abc', 'latin-1')
TypeError: decoding str is not supported

Interestingly the online docs for str say it can decode either a byte
string or a character buffer, a term which doesn't yield a definition
in a search; apparently either a string is not a character buffer, or
the docs are incorrect.
http://docs.python.org/py3k/library/functions.html?highlight=str#str

However it looks like this is consistent with int.
>>> int(4, 0)
TypeError: int() can't convert non-string with explicit base

-- 
Michael Urman
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Stephen J. Turnbull

Lennart Regebro writes:

 > 2010/6/21 Stephen J. Turnbull :
 > > IMO, the UI is right.  "Something" like the above "ought" to work.
 > 
 > Right. That said, many times when you want to do urlparse etc they
 > might be binary, and you might want binary. So maybe the methods
 > should work with both?

First, a caveat: I'm a Unicode/encodings person, not an experienced
web programmer.  My opinions on whether this would work well in
practice should be taken with a grain of salt.

Speaking for myself, I live in a country where the natives have
saddled themselves with no less than 4 encodings in common use, and I
would never want "binary" since none of them would display as anything
useful in a traceback.  Wherever possible, I decode "blobs" into
structured objects, I do it as soon as possible, and if for efficiency
reasons I want to do this lazily, I store the blob in a separate
.raw_object attribute.  If they're textual, I decode them to text.  I
can't see an efficiency argument for decoding URIs lazily in most
applications.

In the case of structured text like URIs, I would create a separate
class for handling them with string-like operations.  Internally, all
text would be raw Unicode (ie, not url-encoded); repr(uri) would use
some kind of readable quoting convention (not url-encoding) to
disambiguate random reserved characters from separators, while
str(uri) would produce an url-encoded string.  Converting to and from
wire format is just .encode and .decode, then, and in this country you
need to be flexible about which encoding you use.

Agreed, this stuff is really annoying.  But I think that just comes
with the territory.  PJE reports that folks don't like doing encoding
and decoding all over the place.  I understand that, but if they're
doing a lot of that, I have to wonder why.  Why not define the one
line function and get on with life?

The thing is, where I live, it's not going to be a one line function.
I'm going to be dealing with URLs that are url-encoded representations
of UTF-8, Shift-JIS, EUC-JP, and occasionally RFC 2047!  So I need an
API that explicitly encodes and decodes.  And I need an API that
presents Japanese as Japanese rather than as line noise.

Eg, PJE writes

Ugh.  I meant: 

newurl = urljoin(str(base, 'latin-1'), 'subdir').encode('latin-1')

Which just goes to the point of how ridiculous it is to have to  
convert things to strings and back again to use APIs that ought to  
just handle bytes properly in the first place. 

But if you need that "everywhere", what's so hard about

def urljoin_wrapper (base, subdir):
return urljoin(str(base, 'latin-1'), subdir).encode('latin-1')

Now, note how that pattern fails as soon as you want to use
non-ISO-8859-1 languages for subdir names.  In Python 3, the code
above is just plain buggy, IMHO.  The original author probably will
never need the generalization.  But her name will be cursed unto the
nth generation by people who use her code on a different continent.

The net result is that bytes are *not* a programmer- or user-friendly
way to do this, except for the minority of the world for whom Latin-1
is a good approximation to their daily-use unibyte encoding (eg, it's
probably usable for debugging in Dansk, but you won't win any
popularity contests in Tel Aviv or Shanghai).

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Terry Reedy


On 6/21/2010 8:33 AM, Nick Coghlan wrote:


P.S. (We're going to have a tough decision to make somewhere along the
line where docs.python.org is concerned, too - when do we flick the
switch and make a 3.x version of the docs the default?


Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. 
Trunk released always take over docs.python.org. To do otherwise would 
be to say that 3.2 is not a real trunk release and not yet ready for 
real use -- a major slam.


Actually, I thought this was already discussed and decided ;-).

Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi

On Mon, Jun 21, 2010 at 11:43:07AM -0400, Barry Warsaw wrote:
> On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:
> 
> >Something that may make sense to ease the porting process is for some
> >of these "on the boundary" I/O related string manipulation functions
> >(such as os.path.join) to grow "encoding" keyword-only arguments. The
> >recommended approach would be to provide all strings, but bytes could
> >also be accepted if an encoding was specified. (If you want to mix
> >encodings - tough, do the decoding yourself).
> 
> This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
> for it.
> 
> Would it make sense to have "encoding-carrying" bytes and str types?
> Basically, I'm thinking of types (maybe even the current ones) that carry
> around a .encoding attribute so that they can be automatically encoded and
> decoded where necessary.  This at least would simplify APIs that need to do
> the conversion.
> 
> By default, the .encoding attribute would be some marker to indicated "I have
> no idea, do it explicitly" and if you combine ebytes or estrs that have
> incompatible encodings, you'd either throw an exception or reset the .encoding
> to IAmConfuzzled.  But say you had an email header like:
> 
> =?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?=
> 
> And code like the following (made less crappy):
> 
> -snip snip-
> class ebytes(bytes):
> encoding = 'ascii'
> 
> def __str__(self):
> s = estr(self.decode(self.encoding))
> s.encoding = self.encoding
> return s
> 
> 
> class estr(str):
> encoding = 'ascii'
> 
> 
> s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 
> 'euc-jp')
> b = bytes(s, 'euc-jp')
> 
> eb = ebytes(b)
> eb.encoding = 'euc-jp'
> es = str(eb)
> print(repr(eb), es, es.encoding)
> -snip snip-
> 
> Running this you get:
> 
> b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド！ 
> euc-jp
> 
> Would it be feasible?  Dunno.  Would it help ease the bytes/str confusion?
> Dunno.  But I think it would help make APIs easier to design and use because
> it would cut down on the encoding-keyword function signature infection.
> 
I like the idea of having encoding information carried with the data.
I don't think that an ebytes type that can *optionally* have an encoding
attribute makes the situation less confusing, though.  To me the biggest
problem with python-2.x's unicode/bytes handling was not that it threw
exceptions but that it didn't always throw exceptions.  You might test this
in python2::
t = u'cafe'
function(t)

And say, ah my code works.  Then a user gives it this::
t = u'café'
function(t)

And get a unicode error because the function only works with unicode in the
ascii range.

ebytes seems to have the same pitfall where the code path exercised by your
tests could work with::
eb = ebytes(b)
eb.encoding = 'euc-jp'
function(eb)

but the user exercises a code path that does this and fails::
eb = ebytes(b)
function(eb)

What do you think of making the encoding attribute a mandatory part of
creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).

-Toshio


pgpc4qEcxzofr.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Terry Reedy


On 6/21/2010 11:31 AM, Arc Riley wrote:

Personally, I'd like to celebrate the upcoming Python 3.2 release (which
will hopefully include 3to2) with moving all packages which do not have
the 'Programming Language :: Python :: 3' classifier to a "Legacy"
section of PyPI and offer only Python 3 packages otherwise.  Of course
put a banner at the top clearly explaining that Python 2 packages can be
found in the Legacy section.


I do not think 2.x should be dissed any more than 3.x, which is to say, 
not at all. The impression I got from lurking on #python last night, in 
between disconnects, is that at least a couple of people feel that there 
is a move afoot to push people to Python3. Whether that had any 
connection to discussions here, I could not tell.


Having pypi.python.org/py2 and pypi.python.org/py3 though might be a 
good idea. Inquiries from either url would automatically filter. The 
counterargument is that there may be people looking for packages 
available for *both*.



Radical, I know, but at some point we really need to make this move.

PyPI really needs a mechanism to cull out the moribund packages from
being displayed next to the actively maintained ones.


The default ordering for search results is by rating.

  There's so many

packages on there that only work on Python 2.2-2.4 (for example), or
with a specific highly outdated version of another package, etc.


And there are people running those versions. I think better 
classification and filtering is the answer, though hard to mandate.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby


At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:

It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need to grow "encoding" keyword arguments that they then pass on to
the functions they call or use to convert str arguments to bytes (or
vice-versa). But without people trying to port affected libraries and
reporting bugs when they find issues, the situation isn't going to
improve.

Now, if these bugs are already being reported against 3.1 and just
aren't getting fixed, that's a completely different story...


The overall impression, though, is that this isn't really a step 
forward.  Now, bytes are the special case instead of unicode, but 
that special case isn't actually handled any better by the stdlib - 
in fact, it's arguably worse.  And, the burden of addressing this 
seems to have been shifted from the people who made the change, to 
the people who are going to use it.  But those people are not 
necessarily in a position to tell you anything more than, "give me 
something that works with bytes".


What I can tell you is that before, since string constants in the 
stdlib were ascii bytes, and transparently promoted to unicode, 
stdlib behavior was *predictable* in the presence of special cases: 
you got back either bytes or unicode, but either way, you could 
idempotently upgrade the result to unicode, or just pass it on.  APIs 
were "str safe, unicode aware".  If you passed in bytes, you weren't 
going to get unicode without a warning, and if you passed in unicode, 
it'd work and you'd get unicode back.


Now, the APIs are neither safe nor aware -- if you pass bytes in, you 
get unpredictable results back.


Ironically, it almost *would* have been better if bytes simply didn't 
work as strings at all, *ever*, but if you could wrap them with a 
bstr() to *treat* them as text.  You could still have restrictions on 
combining them, as long as it was a restriction on the unicode you 
mixed with them.  That is, if you could combine a bstr and a str if 
the *str* was restricted to ASCII.


If we had the Python 3 design discussions to do over again, I think I 
would now have stuck with the position of not letting bytes be 
string-compatible at all, and instead proposed an explicit bstr() 
wrapper/adapter to use them as strings, that would (in that case) 
force coercion in the direction of bytes rather than strings.  (And 
bstr need not have been a builtin - it could have been something you 
import, to help discourage casual usage.)


Might this approach lead to some people doing things wrong in the 
case of porting?  Sure.  But there'd be little reason to use it in 
new code that didn't have a real need for bytestring manipulation.


It might've been a better balance between practicality and purity, in 
that it keeps the language pure, while offering a practical way to 
deal with things in bytes if you really need to.  And, bytes wouldn't 
silently succeed *some* of the time, leading to a trap.  An easy 
inconsistency is worse than a bit of uniform chicken-waving.


Is it too late to make that tradeoff?  Probably.  Certainly it's not 
practical to *implement* outside the language core, and removing 
string methods would fux0r anybody whose currently-ported code relies 
on bytes objects having string-like methods.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Michael Foord


On 21/06/2010 17:46, P.J. Eby wrote:

At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:

It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need to grow "encoding" keyword arguments that they then pass on to
the functions they call or use to convert str arguments to bytes (or
vice-versa). But without people trying to port affected libraries and
reporting bugs when they find issues, the situation isn't going to
improve.

Now, if these bugs are already being reported against 3.1 and just
aren't getting fixed, that's a completely different story...


The overall impression, though, is that this isn't really a step 
forward. Now, bytes are the special case instead of unicode, but that 
special case isn't actually handled any better by the stdlib - in 
fact, it's arguably worse. And, the burden of addressing this seems to 
have been shifted from the people who made the change, to the people 
who are going to use it. But those people are not necessarily in a 
position to tell you anything more than, "give me something that works 
with bytes".


What I can tell you is that before, since string constants in the 
stdlib were ascii bytes, and transparently promoted to unicode, stdlib 
behavior was *predictable* in the presence of special cases: you got 
back either bytes or unicode, but either way, you could idempotently 
upgrade the result to unicode, or just pass it on. APIs were "str 
safe, unicode aware". If you passed in bytes, you weren't going to get 
unicode without a warning, and if you passed in unicode, it'd work and 
you'd get unicode back.


Now, the APIs are neither safe nor aware -- if you pass bytes in, you 
get unpredictable results back.


Ironically, it almost *would* have been better if bytes simply didn't 
work as strings at all, *ever*, but if you could wrap them with a 
bstr() to *treat* them as text. You could still have restrictions on 
combining them, as long as it was a restriction on the unicode you 
mixed with them. That is, if you could combine a bstr and a str if the 
*str* was restricted to ASCII.


If we had the Python 3 design discussions to do over again, I think I 
would now have stuck with the position of not letting bytes be 
string-compatible at all, and instead proposed an explicit bstr() 
wrapper/adapter to use them as strings, that would (in that case) 
force coercion in the direction of bytes rather than strings. (And 
bstr need not have been a builtin - it could have been something you 
import, to help discourage casual usage.)


Might this approach lead to some people doing things wrong in the case 
of porting? Sure. But there'd be little reason to use it in new code 
that didn't have a real need for bytestring manipulation.


It might've been a better balance between practicality and purity, in 
that it keeps the language pure, while offering a practical way to 
deal with things in bytes if you really need to. And, bytes wouldn't 
silently succeed *some* of the time, leading to a trap. An easy 
inconsistency is worse than a bit of uniform chicken-waving.


Is it too late to make that tradeoff? Probably. Certainly it's not 
practical to *implement* outside the language core, and removing 
string methods would fux0r anybody whose currently-ported code relies 
on bytes objects having string-like methods.




Why is your proposed bstr wrapper not practical to implement outside the 
core and use in your own libraries and frameworks?


Michael


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk 




--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby


At 01:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:

But if you need that "everywhere", what's so hard about

def urljoin_wrapper (base, subdir):
return urljoin(str(base, 'latin-1'), subdir).encode('latin-1')

Now, note how that pattern fails as soon as you want to use
non-ISO-8859-1 languages for subdir names.


Bear in mind that the use cases I'm talking about here are WSGI 
stacks with components written by multiple authors -- each of whom 
may have to define that function, and still get it right.


Sure, there are some things that could go in wsgiref in the 
stdlib.  However, as of this moment, there's only a very uneasy rough 
consensus in Web-Sig as to how the heck WSGI should actually *work* 
on Python 3, because of issues like these.


That makes it tough to actually say what should happen in the stdlib 
-- e.g., which things should be classed as stdlib bugs, which things 
should be worked around with wrappers or new functions, etc.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] [RELEASED] Python 2.7 release candidate 2

2010-06-21 Thread Benjamin Peterson

On behalf of the Python development team, I'm tickled pink to announce the
second release candidate of Python 2.7.

Python 2.7 is scheduled (by Guido and Python-dev) to be the last major version
in the 2.x series. However, 2.7 will have an extended period of bugfix
maintenance.

2.7 includes many features that were first released in Python 3.1. The faster io
module, the new nested with statement syntax, improved float repr, set literals,
dictionary views, and the memoryview object have been backported from 3.1. Other
features include an ordered dictionary implementation, unittests improvements, a
new sysconfig module, auto-numbering of fields in the str/unicode format method,
and support for ttk Tile in Tkinter.  For a more extensive list of changes in
2.7, see http://doc.python.org/dev/whatsnew/2.7.html or Misc/NEWS in the Python
distribution.

To download Python 2.7 visit:

 http://www.python.org/download/releases/2.7/

While this is a preview release and is thus not suitable for production use, we
strongly encourage Python application and library developers to test the release
with their code and report any bugs they encounter to:

 http://bugs.python.org/

This helps ensure that those upgrading to Python 2.7 will encounter as few bumps
as possible.

2.7 documentation can be found at:

 http://docs.python.org/2.7/


Enjoy!

--
Benjamin Peterson
Release Manager
benjamin at python.org
(on behalf of the entire python-dev team and 2.7's contributors)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 11:43 AM 6/21/2010 -0400, Barry Warsaw wrote:

On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:
>Something that may make sense to ease the porting process is for some
>of these "on the boundary" I/O related string manipulation functions
>(such as os.path.join) to grow "encoding" keyword-only arguments. The
>recommended approach would be to provide all strings, but bytes could
>also be accepted if an encoding was specified. (If you want to mix
>encodings - tough, do the decoding yourself).

This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have "encoding-carrying" bytes and str types?

It's not a stupid idea, and could potentially work.  It also might 
have a better chance of being able to actually be *implemented* in 
3.x than my idea.

Basically, I'm thinking of types (maybe even the current ones) that carry
around a .encoding attribute so that they can be automatically encoded and
decoded where necessary.  This at least would simplify APIs that need to do
the conversion.

I'm not really sure how much use the encoding is on a unicode object 
- what would it actually mean?

Hm. I suppose it would effectively mean "this string can be 
represented in this encoding" -- which is useful, in that you could 
fail operations when combining with bytes of a different encoding.

Hm... no, in that case you should just encode the string to the 
bytes' encoding, and let that throw an error if it fails.  So, 
really, there's no reason for a string to know its encoding.  All you 
need is the bytes type to have an encoding attribute, and when doing 
mixed-type operations between bytes and strings, coerce to *bytes of 
the same encoding*.

However, if .encoding is None, then coercion would follow the same 
rules as now -- i.e., convert the bytes to unicode, assuming an ascii 
encoding.  (This would be different than setting an encoding of 
'ascii', because in that case, it means you want cross-type 
operations to result in ascii bytes, rather than a  unicode string, 
and to fail if the unicode part can't be encoded appropriately.  The 
'None' setting is effectively a nod to compatibility with prior 3.x 
versions, since I assume we can't just throw out the old coercion behavior.)

Then, a few more changes to the bytes type would round out the implementation:

* Allow .decode() to not specify an encoding, unless .encoding is None

* Add back in the missing string methods (e.g. .encode()), since you 
can transparently upgrade to a string)

* Smart __str__, as shown in your proposal.

Would it be feasible?  Dunno.

Probably, although it might mean adding back in special cases that 
were previously taken out, and a few new ones.

  Would it help ease the bytes/str confusion?  Dunno.

Not sure what confusion you mean -- Web-SIG and I at least are not 
confused about the difference between bytes and str, or we wouldn't 
be having an issue.  ;-)  Or maybe you mean the stdlib's API 
confusion?  In which case, yes, definitely!

  But I think it would help make APIs easier to design and use because
it would cut down on the encoding-keyword function signature infection.

Not only that, but I believe it would also retroactively make the 
stdlib's implementation of those APIs "correct" again, and give us 
One Obvious Way to work with bytes of a known encoding, while 
constraining any unicode that gets combined with those bytes to be 
validly encodable.  It also gives you an idempotent constructor for 
bytes of a specified encoding, that can take either a bytes of 
unspecified encoding, a bytes of the correct encoding, or a string 
that can be encoded as such.

In short, +1.  (I wish it were possible to go back and make bytes 
non-strings and have only this ebytes or bstr or whatever type have 
string methods, but I'm pretty sure that ship has already sailed.)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby


At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:

What do you think of making the encoding attribute a mandatory part of
creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).


As long as the coercion rules force str+ebytes (or str % ebytes, 
ebytes % str, etc.) to result in another ebytes (and fail if the str 
can't be encoded in the ebytes' encoding), I'm personally fine with 
it, although I really like the idea of tacking the encoding to bytes 
objects in the first place.


OTOH, one potential problem with having the encoding on the bytes 
object rather than the ebytes object is that then you can't easily 
take bytes from a socket and then say what encoding they are, without 
interfering with the sockets API (or whatever other place you get the 
bytes from).


So, on balance, making ebytes a separate type (perhaps one that's 
just a pointer to the bytes and a pointer to the encoding) would 
indeed make more sense.  It having different coercion rules for 
interacting with strings would make more sense too in that 
case.  (The ideal, of course, would still be to not let bytes objects 
be stringlike at all, with only ebytes acting string-like.  That way, 
you'd be forced to be explicit about your encoding when working with 
bytes, but all you'd need to do was make an ebytes call.)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Toshio Kuratomi

On Tue, Jun 22, 2010 at 01:08:53AM +0900, Stephen J. Turnbull wrote:
> Lennart Regebro writes:
> 
>  > 2010/6/21 Stephen J. Turnbull :
>  > > IMO, the UI is right.  "Something" like the above "ought" to work.
>  > 
>  > Right. That said, many times when you want to do urlparse etc they
>  > might be binary, and you might want binary. So maybe the methods
>  > should work with both?
> 
> First, a caveat: I'm a Unicode/encodings person, not an experienced
> web programmer.  My opinions on whether this would work well in
> practice should be taken with a grain of salt.
> 
> Speaking for myself, I live in a country where the natives have
> saddled themselves with no less than 4 encodings in common use, and I
> would never want "binary" since none of them would display as anything
> useful in a traceback.  Wherever possible, I decode "blobs" into
> structured objects, I do it as soon as possible, and if for efficiency
> reasons I want to do this lazily, I store the blob in a separate
> .raw_object attribute.  If they're textual, I decode them to text.  I
> can't see an efficiency argument for decoding URIs lazily in most
> applications.
> 
> In the case of structured text like URIs, I would create a separate
> class for handling them with string-like operations.  Internally, all
> text would be raw Unicode (ie, not url-encoded); repr(uri) would use
> some kind of readable quoting convention (not url-encoding) to
> disambiguate random reserved characters from separators, while
> str(uri) would produce an url-encoded string.  Converting to and from
> wire format is just .encode and .decode, then, and in this country you
> need to be flexible about which encoding you use.
> 
> Agreed, this stuff is really annoying.  But I think that just comes
> with the territory.  PJE reports that folks don't like doing encoding
> and decoding all over the place.  I understand that, but if they're
> doing a lot of that, I have to wonder why.  Why not define the one
> line function and get on with life?
> 
> The thing is, where I live, it's not going to be a one line function.
> I'm going to be dealing with URLs that are url-encoded representations
> of UTF-8, Shift-JIS, EUC-JP, and occasionally RFC 2047!  So I need an
> API that explicitly encodes and decodes.  And I need an API that
> presents Japanese as Japanese rather than as line noise.
> 
> Eg, PJE writes
> 
> Ugh.  I meant: 
> 
> newurl = urljoin(str(base, 'latin-1'), 'subdir').encode('latin-1')
> 
> Which just goes to the point of how ridiculous it is to have to  
> convert things to strings and back again to use APIs that ought to  
> just handle bytes properly in the first place. 
> 
> But if you need that "everywhere", what's so hard about
> 
> def urljoin_wrapper (base, subdir):
> return urljoin(str(base, 'latin-1'), subdir).encode('latin-1')
> 
> Now, note how that pattern fails as soon as you want to use
> non-ISO-8859-1 languages for subdir names.  In Python 3, the code
> above is just plain buggy, IMHO.  The original author probably will
> never need the generalization.  But her name will be cursed unto the
> nth generation by people who use her code on a different continent.
> 
> The net result is that bytes are *not* a programmer- or user-friendly
> way to do this, except for the minority of the world for whom Latin-1
> is a good approximation to their daily-use unibyte encoding (eg, it's
> probably usable for debugging in Dansk, but you won't win any
> popularity contests in Tel Aviv or Shanghai).
> 
One comment here -- you can also have uri's that aren't decodable into their
true textual meaning using a single encoding.

Apache will happily serve out uris that have utf-8, shift-jis, and euc-jp
components inside of their path but the textual representation that was intended
will be garbled (or be represented by escaped byte sequences).  For that
matter, apache will serve requests that have no true textual representation
as it is working on the byte level rather than the character level.

So a complete solution really should allow the programmer to pass in uris as
bytes when the programmer knows that they need it.

-Toshio


pgpAvx546YBxD.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Terry Reedy

On 6/20/2010 11:56 PM, Terry Reedy wrote:

The specific example is

 >>> urllib.parse.parse_qsl('a=b%e0')
[('a', 'b�')]

where the character after 'b' is white ? in dark diamond, indicating an
error.

parse_qsl() splits that input on '=' and sends each piece to
urllib.parse.unquote
unquote() attempts to "Replace %xx escapes by their single-character
equivalent.". unquote has an encoding parameter that defaults to 'utf-8'
in *its* call to .decode. parse_qsl does not have an encoding parameter.
If it did, and it passed that to unquote, then
the above example would become (simulated interaction)

 >>> urllib.parse.parse_qsl('a=b%e0', encoding='latin-1')
[('a', 'bà')]

I got that output by copying the file and adding "encoding-'latin-1'" to
the unquote call.

Does this solve this problem?
Has anything like this been added for 3.2?
Should it be?

With a little searching, I found
http://bugs.python.org/issue5468
with Miles Kaufmann's year-old comment "parse_qs and parse_qsl should 
also grow encoding and errors parameters to pass to the underlying 
unquote()". Patch review is needed.

Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Stephan Richter

On Monday, June 21, 2010, Nick Coghlan wrote:
> A decent listing of major packages that already support Python 3 would
> be very handy for the new Python2orPython3 page I created on the wiki,
> and easier to keep up-to-date. (the old Early2to3Migrations page
> didn't look particularly up to date, but hopefully we can keep the new
> list in a happier state).

I really just want to be able to go to PyPI, Click on "Browse packages" and 
then select "Python 3" (it can currently be accomplished by clicking "Python" 
and then  "3"). Of course, package developers need to be encouraged to add 
these Trove classifiers so that the listings are as complete as possible.

Regards,
Stephan
-- 
Entrepreneur and Software Geek
Google me. "Zope Stephan Richter"
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Guido van Rossum

On Mon, Jun 21, 2010 at 9:46 AM, P.J. Eby  wrote:
> At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:
>>
>> It may be that there are places where we need to rewrite standard
>> library algorithms to be bytes/str neutral (e.g. by using length one
>> slices instead of indexing). It may be that there are more APIs that
>> need to grow "encoding" keyword arguments that they then pass on to
>> the functions they call or use to convert str arguments to bytes (or
>> vice-versa). But without people trying to port affected libraries and
>> reporting bugs when they find issues, the situation isn't going to
>> improve.
>>
>> Now, if these bugs are already being reported against 3.1 and just
>> aren't getting fixed, that's a completely different story...
>
> The overall impression, though, is that this isn't really a step forward.
>  Now, bytes are the special case instead of unicode, but that special case
> isn't actually handled any better by the stdlib - in fact, it's arguably
> worse.  And, the burden of addressing this seems to have been shifted from
> the people who made the change, to the people who are going to use it.  But
> those people are not necessarily in a position to tell you anything more
> than, "give me something that works with bytes".
>
> What I can tell you is that before, since string constants in the stdlib
> were ascii bytes, and transparently promoted to unicode, stdlib behavior was
> *predictable* in the presence of special cases: you got back either bytes or
> unicode, but either way, you could idempotently upgrade the result to
> unicode, or just pass it on.  APIs were "str safe, unicode aware".  If you
> passed in bytes, you weren't going to get unicode without a warning, and if
> you passed in unicode, it'd work and you'd get unicode back.

Actually, the big problem with Python 2 is that if you mix str and
unicode, things work or crash depending on whether any of the str
objects involved contain non-ASCII bytes.

If one API decides to upgrade to Unicode, the result, when passed to
another API, may well cause a UnicodeError because not all arguments
have had the same treatment.

> Now, the APIs are neither safe nor aware -- if you pass bytes in, you get
> unpredictable results back.

This seems an overgeneralization of a particular bug. There are APIs
that are strictly text-in, text-out. There are others that are
bytes-in, bytes-out. Let's call all those *pure*. For some operations
it makes sense that the API is *polymorphic*, with which I mean that
text-in causes text-out, and bytes-in causes byte-out. All of these
are fine.

Perhaps there are more situations where a polymorphic API would be
helpful. Such APIs are not always so easy to implement, because they
have to be careful with literals or other constants (and even more so
mutable state) used internally -- but it can be done, and there are
plenty of examples in the stdlib.

The real problem apparently lies in (what I believe is only a few
rare) APIs that are text-or-bytes-in and always-text-out (or
always-bytes-out). Let's call them *hybrid*. Clearly, mixing hybrid
APIs in a stream of pure or polymorphic API calls is a problem,
because they turn a pure or polymorphic overall operation into a
hybrid one.

There are also text-in, bytes-out or bytes-in, text-out APIs that are
intended for encoding/decoding of course, but these are in a totally
different class.

Abstractly, it would be good if there were as few as possible hybrid
APIs, many pure or polymorphic APIs (which it should be in a
particular case is a pragmatic choice), and a limited number of
encoding/decoding APIs, which should generally be invoked at the edges
of the program (e.g., I/O).

> Ironically, it almost *would* have been better if bytes simply didn't work
> as strings at all, *ever*, but if you could wrap them with a bstr() to
> *treat* them as text.  You could still have restrictions on combining them,
> as long as it was a restriction on the unicode you mixed with them.  That
> is, if you could combine a bstr and a str if the *str* was restricted to
> ASCII.

ISTR that we considered something like this and decided to stay away
from it. At this point I think that a successful 3rd party bstr
implementation would be required before we rush to add one to the
stdlib.

> If we had the Python 3 design discussions to do over again, I think I would
> now have stuck with the position of not letting bytes be string-compatible
> at all,

They aren't, unless you consider the presence of some methods with
similar behavior (.lower(), .split() and so on) and the existence of
some polymorphic APIs (see above) as "compatibility".

> and instead proposed an explicit bstr() wrapper/adapter to use them
> as strings, that would (in that case) force coercion in the direction of
> bytes rather than strings.  (And bstr need not have been a builtin - it
> could have been something you import, to help discourage casual usage.)

I'm stil unclear on exactly what bstr is supposed to be, but it

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby

At 05:49 PM 6/21/2010 +0100, Michael Foord wrote:
Why is your proposed bstr wrapper not practical to implement outside 
the core and use in your own libraries and frameworks?

__contains__ doesn't have a converse operation, so you can't code a 
type that works around this (Python 3.1 shown):

>>> from os.path import join
>>> join(b'x','y')
Traceback (most recent call last):
  File "", line 1, in 
  File "c:\Python31\lib\ntpath.py", line 161, in join
if b[:1] in seps:
TypeError: Type str doesn't support the buffer API
>>> join('y',b'x')
Traceback (most recent call last):
  File "", line 1, in 
  File "c:\Python31\lib\ntpath.py", line 161, in join
if b[:1] in seps:
TypeError: 'in ' requires string as left operand, not bytes

IOW, only one of these two cases can be worked around by using a bstr 
(or ebytes) that doesn't have support from the core string type.

I'm not sure if the "in" operator is the only case where implementing 
such a type would fail, but it's the most obvious one.  String 
formatting, of both the % and .format() varieties is 
another.  (__rmod__ doesn't help if your bytes object is one of 
several data items in a tuple or dict -- the common case for % formatting.)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Terry Reedy


On 6/21/2010 11:43 AM, Barry Warsaw wrote:


This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have "encoding-carrying" bytes and str types?


On 2009-11-5 I posted 'Add encoding attribute to bytes' to python-ideas. 
It was shot down at the time.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Terry Reedy


On 6/21/2010 8:51 AM, Nick Coghlan wrote:



I don't know that the "all is well" camp actually exists. The camp
that I do see existing is the one that says "without a bug report,
inconsistencies in the standard library's unicode handling won't get
fixed".

The issues picked up by the regression test suite have already been
dealt with, but that suite is unfortunately far from comprehensive.
Just like a lot of Python code that is out there, the standard library
isn't immune to the poor coding practices that were permitted by the
blurry lines between text and octet streams in 2.x.

It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need to grow "encoding" keyword arguments that they then pass on to
the functions they call or use to convert str arguments to bytes (or
vice-versa). But without people trying to port affected libraries and
reporting bugs when they find issues, the situation isn't going to
improve.

Now, if these bugs are already being reported against 3.1 and just
aren't getting fixed, that's a completely different story...


Some of the above have been, over a year ago. See, for instance,
http://bugs.python.org/issue5468
I am getting the impression that the people who use the web modules 
tend, like me, to not have the tools to write and test patches . So they 
can squeak but not grease.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby


At 12:56 PM 6/21/2010 -0400, Toshio Kuratomi wrote:

One comment here -- you can also have uri's that aren't decodable into their
true textual meaning using a single encoding.

Apache will happily serve out uris that have utf-8, shift-jis, and euc-jp
components inside of their path but the textual representation that 
was intended

will be garbled (or be represented by escaped byte sequences).  For that
matter, apache will serve requests that have no true textual representation
as it is working on the byte level rather than the character level.

So a complete solution really should allow the programmer to pass in uris as
bytes when the programmer knows that they need it.


ebytes(somebytes, 'garbage'), perhaps, which would be like ascii, but 
where combining with non-garbage would results in another 'garbage' ebytes?


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] red buildbots on 2.7

2010-06-21 Thread Bill Janssen

Considering that we've just released 2.7rc2, there are an awful lot of
red buildbots for 2.7.  In fact, I don't remember having seen a green
buildbot for OS X and 2.7.  Shouldn't these be fixed?

On OS X Leopard, I'm seeing failures in test_py3kwarn,
test_urllib2_localnet, test_uuid.

On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn,
test_ttk_guionly, and test_urllib2_localnet.

We don't have a buildbot running Snow Leopard, apparently.

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Stephen J. Turnbull

P.J. Eby writes:

 > Note too that this is an argument for symmetry in wrapping the
 > inputs and outputs, so that the code doesn't have to "know" what
 > it's dealing with!

and

 > After all, right now, if a stdlib function might return bytes or 
 > unicode depending on runtime conditions, I can't even hardcode an 
 > .encode() call -- it would fail if the return type is a bytes.

I'm lost.  What stdlib functions are you talking about whose return
type depends on runtime conditions, and what runtime conditions?  What
do you mean by "wrapping"?

The only times I've run into str/bytes nondeterminancy is when I've
mixed str/bytes myself, and passed them into functions that are
type-identities (str -> str, bytes -> bytes), which then appear to
give a nondeterministic result.  It's a deterministic bug, though,
always mine.

 > It's One Obvious Way that I want, but some people seem to be arguing 
 > that the One Obvious Way is to Think Carefully About It Every Time -- 
 > and that seems to violate the "Obvious" part, IMO.  ;-)

Nick alluded to the The One Obvious Way as a change in architecture.

Specifically: Decode all bytes to typed objects (str, images, audio,
structured objects) at input.  Do no manipulations on bytes ever
except decode and encode (both to text, and to special-purpose objects
such as images) in a program that does I/O.  (Obviously image
manipulation libraries etc will have to operate on bytes, but they
should have no functions that consume bytes except constructors a la
bytes.decode() for text, and no functions that produce bytes except
the output serializers that write files and the like, a la
str.encode().)  Encode back to bytes on output.

Yes, this is tedious if you live in an ASCII world, compared to using
bytes as characters.  However, it works for the rest of us, which the
old style doesn't.

As for "Think Carefully About It Every Time", that is required only in
Porting Programs That Mix Operation On Bytes With Operation On Str.
If you write programs from scratch, however, the decode-process-encode
paradigm quickly becomes second nature.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Stephen J. Turnbull

Barry Warsaw writes:

 > Would it make sense to have "encoding-carrying" bytes and str
 > types?

Why limit that to bytes and str?  Why not have all objects carry their
serializer/deserializer around with them?

I think the answer is "no", though, because (1) it would constitute an
attractive nuisance (the default would be abused, it would work fine
in Kansas, and all hell would break loose in Kagoshima, simply
delaying the pain and/or passing it on to third parties), and (2) you
really want this under control of higher level objects that have
access to some knowledge of the environment, rather than the lowest
level.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby


At 10:29 AM 6/21/2010 -0700, Guido van Rossum wrote:

Perhaps there are more situations where a polymorphic API would be
helpful. Such APIs are not always so easy to implement, because they
have to be careful with literals or other constants (and even more so
mutable state) used internally -- but it can be done, and there are
plenty of examples in the stdlib.


What if we could use the time machine to make the APIs that *were* 
polymorphic, regain their previously-polymorphic status, without 
needing to actually *change* any of the code of those functions?


That's what Barry's ebytes proposal would do, with appropriate 
coercion rules.  Passing ebytes into such a function would yield back 
ebytes, even if the function used strings internally, as long as 
those strings could be encoded back to bytes using the ebytes' 
encoding.  (Which would normally be the case, since stdlib constants 
are almost always ASCII, and the main use cases for ebytes would 
involve ascii-extended encodings.)




I'm stil unclear on exactly what bstr is supposed to be, but it sounds
a bit like one of the rejected proposals for having a single
(Unicode-capable) str type that is implemented using different width
encodings (Latin-1, UCS-2, UCS-4) underneath.


Not quite - as modified by Barry's proposal (which I like better than 
mine) it'd be an object that just combines bytes with an attribute 
indicating the underlying encoding.  When it interacts with strings, 
the strings are *encoded* to bytes, rather than upgrading the bytes to text.


This is actually a big advantage for error-detection in any 
application where you're working with data that *must* be encodable 
in a specific encoding for output, as it allows you to catch errors 
much *earlier* than you would if you only did the encoding at your 
output boundary.


Anyway, this would not be the normal bytes type or string type; it's 
"bytes with an encoding".  It's also more general than Unicode, in 
the sense that it allows you to work with character sets that don't 
really *have* a proper Unicode mapping.


One issue I remember from my "enterprise" days is some of the 
Asian-language developers at NTT/Verio explaining to me that unicode 
doesn't actually solve certain issues -- that there are use cases 
where you really *do* need "bytes plus encoding" in order to properly 
express something.  Unfortunately, I never quite wrapped my head 
around the idea, I just remember it had something to do with the fact 
that Unicode has single character codes that mean different things in 
different languages, such that you were actually losing information 
by converting to unicode, or something like that.  (Or maybe the 
characters were expressed differently in certain encodings according 
to what language they came from, so you couldn't roundtrip them 
through unicode without losing information.  I think that's probably 
was what it was; maybe somebody here can chime in more on that point.)


Anyway, a type like this would need to have at least a bit of support 
from the core language, because the str type would need to be able to 
handle at least the __contains__ and %/.format() coercion cases, 
since these functions don't have __r*__ equivalents that a 
user-implemented type could provide...  and strings don't have 
anything like a '__coerce__' either.


If sufficient hooks existed, then an ebytes could be implemented 
outside the stdlib, and still used within it.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Benjamin Peterson

2010/6/21 Bill Janssen :
> Considering that we've just released 2.7rc2, there are an awful lot of
> red buildbots for 2.7.  In fact, I don't remember having seen a green
> buildbot for OS X and 2.7.  Shouldn't these be fixed?

It seems most of them are off line and there last run was just a failure.

>
> On OS X Leopard, I'm seeing failures in test_py3kwarn,
> test_urllib2_localnet, test_uuid.
>
> On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn,
> test_ttk_guionly, and test_urllib2_localnet.

File bug reports.

>
> We don't have a buildbot running Snow Leopard, apparently.




-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] [OT] carping about irritating people (was: bytes / unicode)

2010-06-21 Thread Stephen J. Turnbull

Ben Finney writes:
 > "Stephen J. Turnbull"  writes:
 > 
 > > your base URL is gonna be b'mailto:[email protected]', but the
 > > natural thing the UI will want to do is
 > >
 > > formurl = baseurl + '?subject=うるさいやつだなぁ…'
 > 
 > Incidentally, which irritating person was the topic of this
 > Japanese-language message to you?

(Kudos to Nick.)  "Urusai" is also used to refer to the finicky.  So,
the RFC-toting pedant.  Ie, me.

 > (The subject in Stephen's example message translates roughly as
 > "(unspecified third person)

Not quite.  The subject of the copula, if omitted, is entirely
context-dependent.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby


At 01:36 PM 6/21/2010 -0400, Terry Reedy wrote:

On 6/21/2010 11:43 AM, Barry Warsaw wrote:


This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have "encoding-carrying" bytes and str types?


On 2009-11-5 I posted 'Add encoding attribute to bytes' to 
python-ideas. It was shot down at the time.


AFAICT, that's mainly for lack of apparent use cases, and also for 
confusion.  Here, the use case (restoring the polymorphy of stdlib 
APIs) is pretty clear.


However, if we had the string equivalent of a coercion protocol (that 
core strings and bytes would co-operate with), then it would enable 
people to write their own versions of either your idea or Barry's 
idea (or other things altogether), and still get the stdlib to play along.


Personally, I think ebytes() would do the trick and it'd be nice to 
see it in stdlib, but gaining a string coercion protocol instead 
might not be a bad tradeoff.  ;-)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Antoine Pitrou

On Mon, 21 Jun 2010 10:56:59 PDT
Bill Janssen  wrote:
> Considering that we've just released 2.7rc2, there are an awful lot of
> red buildbots for 2.7.  In fact, I don't remember having seen a green
> buildbot for OS X and 2.7.  Shouldn't these be fixed?
> 
> On OS X Leopard, I'm seeing failures in test_py3kwarn,
> test_urllib2_localnet, test_uuid.
> 
> On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn,
> test_ttk_guionly, and test_urllib2_localnet.

I'm afraid they can only be fixed by whoever is competent on OS X
issues. If you want to tackle them, you're more than welcome.

There also seem to be a couple of failures left with test_gdb...

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Paul Moore

On 21 June 2010 18:56, Bill Janssen  wrote:
> Considering that we've just released 2.7rc2, there are an awful lot of
> red buildbots for 2.7.  In fact, I don't remember having seen a green
> buildbot for OS X and 2.7.  Shouldn't these be fixed?

Ack! My buildbot has looked fine, but on closer inspection, it was the
same build that's been running (more accurately, stuck in a test) for
5 days :-(

The main buildslave page looked fine - except for the dates, which I
didn't spot.

Thanks for the alert. I've killed the stuck test and should see some
runs going through now. Shame, really, I was getting used to seeing a
nice page of all green results...

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby


At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:

Nick alluded to the The One Obvious Way as a change in architecture.

Specifically: Decode all bytes to typed objects (str, images, audio,
structured objects) at input.  Do no manipulations on bytes ever
except decode and encode (both to text, and to special-purpose objects
such as images) in a program that does I/O.


This ignores the existence of use cases where what you have is text 
that can't be properly encoded in unicode.  I know, it's a hard thing 
to wrap one's head around, since on the surface it sounds like 
unicode is the programmer's savior.  Unfortunately, real-world text 
data exists which cannot be safely roundtripped to unicode, and must 
be handled in "bytes with encoding" form for certain operations.


I personally do not have to deal with this *particular* use case any 
more -- I haven't been at NTT/Verio for six years now.  But I do know 
it exists for e.g. Asian language email handling, which is where I 
first encountered it.  At the time (this *may* have changed), many 
popular email clients did not actually support unicode, so you 
couldn't necessarily just send off an email in UTF-8.  It drove us 
nuts on the project where this was involved (an i18n of an existing 
Python app), and I think we had to compromise a bit in some fashion 
(because we couldn't really avoid unicode roundtripping due to 
database issues), but the use case does actually exist.


My current needs are simpler, thank goodness.  ;-)  However, they 
*do* involve situations where I'm dealing with *other* 
encoding-restricted legacy systems, such as software for interfacing 
with the US Postal Service that only works with a restricted subset 
of latin1, while receiving mangled ASCII from an ecommerce provider, 
and storing things in what's effectively a latin-1 database.  Being 
able to easily assert what kind of bytes I've got would actually let 
me catch errors sooner, *if* those assertions were being checked when 
different kinds of strings or bytes were being combined.  i.e., at 
coercion time).




Yes, this is tedious if you live in an ASCII world, compared to using
bytes as characters.  However, it works for the rest of us, which the
old style doesn't.


I'm not trying to go back to the old style -- ideally, I want 
something that would actually improve on the "it's not really 
unicode" use cases above if it were available in 2.x.


I don't want to be "encoding agnostic" or "encoding implicit", -- I 
want to make it possible to be even *more* explicit and restrictive 
than it is currently possible to be in either 2.x OR 3.x.  It's just 
that 3.x affords greater opportunity for doing this, and is an ideal 
place to make the switch -- i.e., at a point where you now have to 
get explicit about your encodings, anyway!




As for "Think Carefully About It Every Time", that is required only in
Porting Programs That Mix Operation On Bytes With Operation On Str.
If you write programs from scratch, however, the decode-process-encode
paradigm quickly becomes second nature.


Which works if and only if your outputs are truly unicode-able.  If 
you work with legacy systems (e.g. those Asian email clients and US 
postal software), you are really working with a *character set*, not 
unicode, and so putting your data in unicode form is actually *wrong* 
-- an expedient lie.


Heresy, I know, but there you go.  ;-)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Robert Collins

2010/6/21 Stephen J. Turnbull :
> Robert Collins writes:
>
>  > Also, url's are bytestrings - by definition;
>
> Eh?  RFC 3896 explicitly says

?Definitions of Managed Objects for the DS3/E3 Interface Type

Perhaps you mean 3986 ? :)

>    A URI is an identifier consisting of a sequence of characters
>    matching the syntax rule named  in Section 3.
>
> (where the phrase "sequence of characters" appears in all ancestors I
> found back to RFC 1738), and

Sure, ok, let me unpack what I meant just a little. An abstract URI is
neither unicode nor bytes per se - see section 1.2.1 " A URI is a
sequence of characters from a very limited set: the letters of the
basic Latin alphabet, digits, and a few special characters. "

URI interpretation is fairly strictly separated between producers and
consumers. A consumer can manipulate a url with other url fragments -
e.g. doing urljoin. But it needs to keep the url as a url and not try
to decode it to a unicode representation.

The producer of the url however, can decode via whatever heuristics it
wants - because it defines the encoding used to go from unicode to URL
encoding.

As an example, if I give the uri "http://server/%c3%83";, rendering
that as http://server/Ã is able to lead to transcription errors and
reinterpretation problems unless you know - out of band - that the
server is using utf8 to encode. Conversely if someone enters in
http://server/Ã in their browser window, choosing utf8 or their local
encoding is quite arbitrary and able to not match how the server would
represent that resource.

Beyond that, producers can do odd things - like when there are a
series of servers stacked and forwarding requests amongst themselves -
where they generate different parts of the same URL using different
encodings.

>    2.  Characters
>
>    The URI syntax provides a method of encoding data, presumably for
>    the sake of identifying a resource, as a sequence of characters.
>    The URI characters are, in turn, frequently encoded as octets for
>    transport or presentation.  This specification does not mandate any
>    particular character encoding for mapping between URI characters
>    and the octets used to store or transmit those characters.  When a
>    URI appears in a protocol element, the character encoding is
>    defined by that protocol; without such a definition, a URI is
>    assumed to be in the same character encoding as the surrounding
>    text.

Thats true, but its been taken out of context; the set of characters
permitted in a URL is a strict subset of characters found in  ASCII;
there is a BNF that defines it and it is quite precise. While it
doesn't define a set of octets, it also doesn't define support for
unicode characters - individual schemes need to define the mapping
used between characters define as safe and those that get percent
encoded. E.g. unicode (abstract) -> utf8 -> percent encoded.

See also the section on comparing URL's - Unicode isn't at all relevant.

>  > if the standard library has made them unicode objects in 3, I
>  > expect a lot of pain in the webserver space.
>
> Yup.  But pain is inevitable if people are treating URIs (whether URLs
> or otherwise) as octet sequences.  Then your base URL is gonna be
> b'mailto:[email protected]', but the natural thing the UI will want
> to do is
>
>    formurl = baseurl + '?subject=うるさいやつだなぁ…'
>
> IMO, the UI is right.  "Something" like the above "ought" to work.

I wish it would. The problem is not in Python here though - and
casually handwaving will exacerbate it, not fix it. Modelling URL's as
string like things is great from a convenience perspective, but, like
file paths, they are much more complex difficult.

For your particular case, subject contains characters outside the URL
specification, so someone needs to choose an encoding to get them into
a sequence-of-bytes-that-can-be-percent-escaped.

Section 2.5, identifying data goes into this to some degree. Note a
trap - the last paragraph says 'when a *NEW* URI scheme...' (emphasis
mine). Existing schemes do not mandate UTF8, which is why the
producer/consumer split matters. I spent a few minutes looking, but
its lost in the minutiae somewhere - HTTP does not specify UTF8
(though I wish it would) for its URI's, and std66 is the generic
definition and rules for new URI schemes, preserving intact the
mistake of HTTP.

> So the function that actually handles composing the URL should take a
> string (ie, unicode), and do all escaping.  The UI code should not
> need to know about escaping.  If nothing escapes except the function
> that puts the URL in composed form, and that function always escapes,
> life is easy.

Arg. The problem is very similar to the file system problem:
 - We get given a sequence of bytes
 - we have some rules that will let us manipulate the sequence to get
hostnames, query parameters and so forth
 - and others to let use walk a directory structure
 - and no guarantee that any of the data is in any particular enc

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Bill Janssen

Benjamin Peterson  wrote:

> 2010/6/21 Bill Janssen :
> > Considering that we've just released 2.7rc2, there are an awful lot of
> > red buildbots for 2.7.  In fact, I don't remember having seen a green
> > buildbot for OS X and 2.7.  Shouldn't these be fixed?
> 
> It seems most of them are off line and there last run was just a failure.

No, the three OS X buildbots are all online and reporting failures.  As
far as I can remember, they haven't been green for weeks.

They are at the end of the buildbot list, so off-screen if you are using
a normal browser.  You have to scroll to see them.

> > On OS X Leopard, I'm seeing failures in test_py3kwarn,
> > test_urllib2_localnet, test_uuid.
> >
> > On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn,
> > test_ttk_guionly, and test_urllib2_localnet.

Um -- saying what, the buildbots are red?  Shouldn't having green
buildbots be a part of the release process?  In fact, it is -- but none
of the OS X buildbots are part of the "stable" set.  Why is that?

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 03:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:

Barry Warsaw writes:

 > Would it make sense to have "encoding-carrying" bytes and str
 > types?

I think the answer is "no", though, because (1) it would constitute an
attractive nuisance (the default would be abused, it would work fine
in Kansas, and all hell would break loose in Kagoshima, simply
delaying the pain and/or passing it on to third parties),

You have the proposal exactly backwards, actually.

In Kagoshima, you'd use pass in an ebytes with your encoding to a 
stdlib API, and *get back an ebytes with the right encoding*, rather 
than an (incorrect and useless) unicode object which has lost data you need.

Why limit that to bytes and str?  Why not have all objects carry their
serializer/deserializer around with them?

Because it's not a serialization or deserialization.  Your conceptual 
framework here implies that unicode objects are the real thing, and 
that bytes are "just" a way of transporting unicode around.

But this is not the case at all, for use cases where "no, really, you 
*have to* work with bytes-encoded text streams".  The mere release of 
Python 3.x will not cause all the world's applications, libraries, 
and protocols to suddenly work with unicode, where they did not before.

Being explicit about the encoding of the bytes you're flinging around 
is actually an *increase* in specificity, explicitness, robustness, 
and error-checking ability over the status quo for either 2.x *or* 
3.x...  *and* it improves these qualities for essentially *all* 
string-handling code, without requiring that code to be rewritten to do so.

It's like getting to use the time machine, really.

and (2) you
really want this under control of higher level objects that have
access to some knowledge of the environment, rather than the lowest
level.

This proposal actually has such a higher-level object: an 
ebytes.  And it passes that information *through* the lowest level, 
in such a way as to permit the stringlike operations to be fully 
polymorphic, without the information being lost inside somebody else's API.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 1:56 PM, Bill Janssen wrote:

> Considering that we've just released 2.7rc2, there are an awful lot of
> red buildbots for 2.7.  In fact, I don't remember having seen a green
> buildbot for OS X and 2.7.  Shouldn't these be fixed?
> 
> On OS X Leopard, I'm seeing failures in test_py3kwarn,
> test_urllib2_localnet, test_uuid.
> 
> On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn,
> test_ttk_guionly, and test_urllib2_localnet.
> 
> We don't have a buildbot running Snow Leopard, apparently.

On my OS X 10.6.4 box, only test_py3kwarn and test_urllib2_localnet fail.

-Barry

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Antoine Pitrou

On Mon, 21 Jun 2010 12:13:05 PDT
Bill Janssen  wrote:
> 
> > > On OS X Leopard, I'm seeing failures in test_py3kwarn,
> > > test_urllib2_localnet, test_uuid.
> > >
> > > On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn,
> > > test_ttk_guionly, and test_urllib2_localnet.
> 
> Um -- saying what, the buildbots are red?  Shouldn't having green
> buildbots be a part of the release process?  In fact, it is -- but none
> of the OS X buildbots are part of the "stable" set.  Why is that?

Benjamin is not qualified to fix OS X bugs AFAIK (if you are, Benjamin,
then sorry for misrepresenting you :-)). Actually, neither are most of
us.

Apparently some of these buildbots belong to you. Why don't you step
up and investigate?

Thanks,

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi

On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote:
> At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
> >What do you think of making the encoding attribute a mandatory part of
> >creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).
> 
> As long as the coercion rules force str+ebytes (or str % ebytes,
> ebytes % str, etc.) to result in another ebytes (and fail if the str
> can't be encoded in the ebytes' encoding), I'm personally fine with
> it, although I really like the idea of tacking the encoding to bytes
> objects in the first place.
> 
I wouldn't like this.  It brings us back to the python2 problem where
sometimes you pass an ebyte into a function and it works and other times you
pass an ebyte into the function and it issues a traceback.  The coercion
must end up with a str and no traceback (this assumes that we've checked
that the ebyte and the encoding "match" when we create the ebyte).

If you want bytes out the other end, you should either have a different
function or explicitly transform the output from str to bytes.

So, what's the advantage of using ebytes instead of bytes?

* It keeps together the text and encoding information when you're taking
  bytes in and want to give bytes back under the same encoding.
* It takes some of the boilerplate that people are supposed to do (checking
  that bytes are legal in a specific encoding) and writes it into the
  initialization of the object.  That forces you to think about the issue
  at two points in the code:  when converting into ebytes and when
  converting out to bytes.  For data that's going to be used with both
  str and bytes, this is the accepted best practice.  (For exceptions, the
  byte type remains which you can do conversion on when you want to).

-Toshio

pgpjsqwszNbF7.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Benjamin Peterson

2010/6/21 Bill Janssen :
> They are at the end of the buildbot list, so off-screen if you are using
> a normal browser.  You have to scroll to see them.

But not on the "stable" view and that's the only one I look at.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 11:13 AM, Stephan Richter wrote:

>I really just want to be able to go to PyPI, Click on "Browse packages" and 
>then select "Python 3" (it can currently be accomplished by clicking "Python" 
>and then  "3"). Of course, package developers need to be encouraged to add 
>these Trove classifiers so that the listings are as complete as possible.

Trove classifiers are not particularly user friendly.  I wonder if we can help
with a (partially) automated or guided tool to help?  Maybe something on the
web page for packages w/o classifications, kind of like a Linked-in progress
meter...

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Michael Foord


On 21/06/2010 20:30, Benjamin Peterson wrote:

2010/6/21 Bill Janssen:
   

They are at the end of the buildbot list, so off-screen if you are using
a normal browser.  You have to scroll to see them.
 

But not on the "stable" view and that's the only one I look at.

   


What are the requirements for moving the OS X buildbots into the stable 
view? Are the builders themselves stable enough? (If the requirement is 
that the buildbots be green then it is something of a catch-22.)


All the best,

Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote:

>I like the idea of having encoding information carried with the data.
>I don't think that an ebytes type that can *optionally* have an encoding
>attribute makes the situation less confusing, though.

Agreed.  I think the attribute should always be there, but there probably
needs to be a magic value (perhaps None) that indicates and unknown, manual,
garbage, error, broken encoding.

Examples: you read bytes off a socket and don't know what the encoding is; you
concatenate two ebytes that have incompatible encodings.

>To me the biggest
>problem with python-2.x's unicode/bytes handling was not that it threw
>exceptions but that it didn't always throw exceptions.  You might test this
>in python2::
>t = u'cafe'
>function(t)
>
>And say, ah my code works.  Then a user gives it this::
>t = u'café'
>function(t)
>
>And get a unicode error because the function only works with unicode in the
>ascii range.

That's an excellent point.

>ebytes seems to have the same pitfall where the code path exercised by your
>tests could work with::
>eb = ebytes(b)
>eb.encoding = 'euc-jp'
>function(eb)
>
>but the user exercises a code path that does this and fails::
>eb = ebytes(b)
>function(eb)
>
>What do you think of making the encoding attribute a mandatory part of
>creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).

If ebytes is a separate type, then definitely +1.  If 'ebytes is bytes' then
I'd probably want to default the second argument to the magical "i-don't-know'
marker.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Steve Holden

Laurens Van Houtven wrote:
> On Mon, Jun 21, 2010 at 5:28 PM, Toshio Kuratomi  wrote:
>>  Fedora 14 is about the same.  A nice to have thing that goes along
>> with these would be a table that has packages ported to python3 and which
>> distributions have the python3 version of the package.
> 
> Yeah, this is exactly why I'd prefer to not have to maintain a
> specific list. Big distros are making Python 3.x available, it's not
> the default interpreter yet anywhere (AFAIK?), but that's going to
> happen in the next few releases of said distributions.
> 
> On Mon, Jun 21, 2010 at 5:31 PM, Arc Riley  wrote:
>> Personally, I'd like to celebrate the upcoming Python 3.2 release (which
>> will hopefully include 3to2) with moving all packages which do not have the
>> 'Programming Language :: Python :: 3' classifier to a "Legacy" section of
>> PyPI and offer only Python 3 packages otherwise.  Of course put a banner at
>> the top clearly explaining that Python 2 packages can be found in the Legacy
>> section.
>>
>> Radical, I know, but at some point we really need to make this move.
> 
> I agree we have to make it at some point but I feel this is way, way too 
> early.
> 
> thanks for your continued input,
> Laurens

But it's never too early to plan for something you know to be
inevitable. More planning might have helped earlier on. I don't think
it's likely to hurt now.

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
"All I want for my birthday is another birthday" -
 Ian Dury, 1942-2000
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Bill Janssen

Antoine Pitrou  wrote:

> Benjamin is not qualified to fix OS X bugs AFAIK (if you are, Benjamin,
> then sorry for misrepresenting you :-)). Actually, neither are most of
> us.

Right.  I was thinking that the release manager should however be
responsible for not releasing while there are red buildbots.  But it's
not his fault, either; there are no OS X buildbots on the "stable" list,
and that's the list PEP 101 says to look at.

The real problem here is that a major platform doesn't have a "stable"
buildbot, I think.  I've logged an issue to that effect.

> Apparently some of these buildbots belong to you. Why don't you step
> up and investigate?

The fact that I'm running some buildbots doesn't mean I have to fix the
problems that they reveal, I think.

I did look at the py3kwarn failure, and couldn't figure out the various
twisty passages of deprecation warning as further snarled by the test
package.  I think that one needs someone who's intimately familiar with
the testing framework.

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Bill Janssen

Benjamin Peterson  wrote:

> 2010/6/21 Bill Janssen :
> > They are at the end of the buildbot list, so off-screen if you are using
> > a normal browser.  You have to scroll to see them.
> 
> But not on the "stable" view and that's the only one I look at.

Right, and properly so.

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 22, 2010, at 03:08 AM, Stephen J. Turnbull wrote:

>Barry Warsaw writes:
>
> > Would it make sense to have "encoding-carrying" bytes and str
> > types?
>
>Why limit that to bytes and str?  Why not have all objects carry their
>serializer/deserializer around with them?

Only because the .encoding attribute isn't really a serializer/deserializer.
That's still bytes() and str() or the equivalent.  This is just a hint to a
specific serializer for parameters to that action.

>I think the answer is "no", though, because (1) it would constitute an
>attractive nuisance (the default would be abused, it would work fine
>in Kansas, and all hell would break loose in Kagoshima, simply
>delaying the pain and/or passing it on to third parties), and (2) you
>really want this under control of higher level objects that have
>access to some knowledge of the environment, rather than the lowest
>level.

I'm still not sure ebytes solves the problem, but it avoids one I'm most
concerned about seeing proposed.  I really really do not want to add
encoding=blah arguments to boatloads of function signatures.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Antoine Pitrou

Le lundi 21 juin 2010 à 12:57 -0700, Bill Janssen a écrit :
>
> > Apparently some of these buildbots belong to you. Why don't you step
> > up and investigate?
> 
> The fact that I'm running some buildbots doesn't mean I have to fix the
> problems that they reveal, I think.

You certainly don't have to. But please don't ask others to do it for
you, *especially* if the failure can't be reproduced under anything else
than OS X, and if no useful diagnosis is available.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote:

>OTOH, one potential problem with having the encoding on the bytes object
>rather than the ebytes object is that then you can't easily take bytes from a
>socket and then say what encoding they are, without interfering with the
>sockets API (or whatever other place you get the bytes from).

Unless the default was the "I don't know" marker and you were able to set it
after you've done whatever kind of application-level calculation you needed to
do.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Steve Holden

Terry Reedy wrote:
> On 6/21/2010 8:33 AM, Nick Coghlan wrote:
> 
>> P.S. (We're going to have a tough decision to make somewhere along the
>> line where docs.python.org is concerned, too - when do we flick the
>> switch and make a 3.x version of the docs the default?
> 
> Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'.
> Trunk released always take over docs.python.org. To do otherwise would
> be to say that 3.2 is not a real trunk release and not yet ready for
> real use -- a major slam.
> 
> Actually, I thought this was already discussed and decided ;-).
> 
This also gives the 2.7 release it's day in the sun before relegation to
maintenance status.

The Python 3 documents, when they become the default, should contain an
every-page link to the Python 2 documentation (though linkages may be a
problem - they could probably be done at a gross level).

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
"All I want for my birthday is another birthday" -
 Ian Dury, 1942-2000

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Stephan Richter

On Monday, June 21, 2010, Barry Warsaw wrote:
>   On Jun 21, 2010, at 11:13 AM, Stephan Richter wrote:
> >I really just want to be able to go to PyPI, Click on "Browse packages"
> >and  then select "Python 3" (it can currently be accomplished by clicking
> >"Python" and then  "3"). Of course, package developers need to be
> >encouraged to add these Trove classifiers so that the listings are as
> >complete as possible.
> 
> Trove classifiers are not particularly user friendly.  I wonder if we can
> help with a (partially) automated or guided tool to help?  Maybe something
> on the web page for packages w/o classifications, kind of like a Linked-in
> progress meter...

Yeah that would be good. I thought the "Score" was something like that, but it 
is not transparent enough. It would be great, if PyPI would tell me how I can 
improve my package meta-data. (The Linked-in progress meter worked for me too. 
;-)

Regards,
Stephan
-- 
Entrepreneur and Software Geek
Google me. "Zope Stephan Richter"
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Adding additional level of bookmarks and section numbers in python pdf documents.

2010-06-21 Thread Peng Yu

Hi,

Current pdf version of python documents don't have bookmarks for
sussubsection. For example, there is no bookmark for the following
section in python_2.6.5_reference.pdf. Also the bookmarks don't have
section numbers in them. I suggest to include the section numbers.
Could these features be added in future release of python document.

3.4.1 Basic customization

-- 
Regards,
Peng
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 03:29 PM, Toshio Kuratomi wrote:

>I wouldn't like this.  It brings us back to the python2 problem where
>sometimes you pass an ebyte into a function and it works and other times you
>pass an ebyte into the function and it issues a traceback.  The coercion
>must end up with a str and no traceback (this assumes that we've checked
>that the ebyte and the encoding "match" when we create the ebyte).

Doing this at ebyte construction time does have the nice benefit of getting
the exception early, and because the ebyte is unmutable, you could cache the
results in an attribute on the ebyte.  Well, unmutable if the .encoding is
also unmutable.  If that can change, then you'd have to re-run the cached
decoding whenever the attribute were set, and there would be a penalty paid
each time this was done.

That, plus the socket use case, does argue for a separate ebytes type.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote:

On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote:
> At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
> >What do you think of making the encoding attribute a mandatory part of
> >creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).
>
> As long as the coercion rules force str+ebytes (or str % ebytes,
> ebytes % str, etc.) to result in another ebytes (and fail if the str
> can't be encoded in the ebytes' encoding), I'm personally fine with
> it, although I really like the idea of tacking the encoding to bytes
> objects in the first place.
>
I wouldn't like this.  It brings us back to the python2 problem where
sometimes you pass an ebyte into a function and it works and other times you
pass an ebyte into the function and it issues a traceback.

For stdlib functions, this isn't going to happen unless your ebytes' 
encoding is not compatible with the ascii subset of unicode, or the 
stdlib function is working with dynamic data...  in which case you 
really *do* want to fail early!

I don't see this as a repeat of the 2.x situation; rather, it allows 
you to cause errors to happen much *earlier* than they would 
otherwise show up if you were using unicode for your encoded-bytes data.

For example, if your program's intent is to end up with latin-1 
output, then it would be better for an error to show up at the very 
*first* point where non-latin1 characters are mixed with your data, 
rather than only showing up at the output boundary!

However, if you promoted mixed-type operation results to unicode 
instead of ebytes, then you:

1) can't preserve data that doesn't have a 1:1 mapping to unicode, and

2) can't detect an error until your data reaches the output point in 
your application -- forcing you to defensively insert ebytes calls 
everywhere (vs. simply wrapping them around a handful of designated 
inputs), or else have to go right back to tracing down where the 
unusable data showed up in the first place.

One thing that seems like a bit of a blind spot for some folks is 
that having unicode is *not* everybody's goal.  Not because we don't 
believe unicode is generally a good thing or anything like that, but 
because we have to work with systems that flat out don't *do* 
unicode, thereby making the presence of (fully-general) unicode an 
error condition that has to be stamped out!

IOW, if you're producing output that has to go into another system 
that doesn't take unicode, it doesn't matter how 
theoretically-correct it would be for your app to process the data in 
unicode form.  In that case, unicode is not a feature: it's a bug.

And as it really *is* an error in that case, it should not pass 
silently, unless explicitly silenced.

So, what's the advantage of using ebytes instead of bytes?

* It keeps together the text and encoding information when you're taking
  bytes in and want to give bytes back under the same encoding.
* It takes some of the boilerplate that people are supposed to do (checking
  that bytes are legal in a specific encoding) and writes it into the
  initialization of the object.  That forces you to think about the issue
  at two points in the code:  when converting into ebytes and when
  converting out to bytes.  For data that's going to be used with both
  str and bytes, this is the accepted best practice.  (For exceptions, the
  byte type remains which you can do conversion on when you want to).

Hm.  For the output case, I suppose that means you might also want 
the text I/O wrappers to be able to be strict about ebytes' encoding.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Michael Foord


On 21/06/2010 21:02, Antoine Pitrou wrote:

Le lundi 21 juin 2010 à 12:57 -0700, Bill Janssen a écrit :
   
 

Apparently some of these buildbots belong to you. Why don't you step
up and investigate?
   

The fact that I'm running some buildbots doesn't mean I have to fix the
problems that they reveal, I think.
 

You certainly don't have to. But please don't ask others to do it for
you, *especially* if the failure can't be reproduced under anything else
than OS X, and if no useful diagnosis is available.
   


If OS X is a supported and important platform for Python then fixing all 
problems that it reveals (or being willing to) should definitely not be 
a pre-requisite of providing a buildbot (which is already a service to 
the Python developer community). Fixing bugs / failures revealed by 
Bill's buildbot is not fixing them "for Bill" it is fixing them for Python.


All the best,

Michael


Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
   



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote:

On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote:

>OTOH, one potential problem with having the encoding on the bytes object
>rather than the ebytes object is that then you can't easily take 
bytes from a

>socket and then say what encoding they are, without interfering with the
>sockets API (or whatever other place you get the bytes from).

Unless the default was the "I don't know" marker and you were able to set it
after you've done whatever kind of application-level calculation you needed to
do.

True, but making it a separate type with a required encoding gets rid 
of the magical "I don't know" - the "I don't know" encoding is just a 
plain old bytes object.

(In principle, you could then drop *all* the stringlike methods from 
plain-old-bytes objects.  If it's really text-in-bytes you want, you 
should use an ebytes with the encoding specified.)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 01:17 PM, P.J. Eby wrote:

>I'm not really sure how much use the encoding is on a unicode object - what
>would it actually mean?
>
>Hm. I suppose it would effectively mean "this string can be represented in
>this encoding" -- which is useful, in that you could fail operations when
>combining with bytes of a different encoding.

That's basically what I was thinking.

>Hm... no, in that case you should just encode the string to the bytes'
>encoding, and let that throw an error if it fails.  So, really, there's no
>reason for a string to know its encoding.  All you need is the bytes type to
>have an encoding attribute, and when doing mixed-type operations between
>bytes and strings, coerce to *bytes of the same encoding*.

If ebytes were a separate type, and it did the encoding check at constructor
time, and the results of the decoding were cached, then I think you would not
need the equivalent of an estr type.  If you had a string and knew what it
could be encoded to, then you could just coerce it to an ebytes and use the
cached decoded value wherever you needed it.

E.g.

>>> mystring = 'some unicode string'
>>> myencoding = 'iso--foo'
>>> myebytes = ebytes(mystring, myencoding)
>>> myebytes.encoding == myencoding
True
>>> myebytes.string == mystring
True

So ebytes() could accept a str or bytes as its first argument.

>>> mybytes = b'some encoded string'
>>> myebytes = ebytes(mybytes, myencoding)
>>> mybytes == myebytes
True
>>> myebytes.encoding == myencoding
True

In the first example ebytes() encodes mystring to set the internal bytes
representation.  In the second example, ebytes() decodes the bytes to get the
.string attribute value.  In both cases, an exception is raised if the
encoding/decoding fails.

>However, if .encoding is None, then coercion would follow the same rules as
>now -- i.e., convert the bytes to unicode, assuming an ascii encoding.  (This
>would be different than setting an encoding of 'ascii', because in that case,
>it means you want cross-type operations to result in ascii bytes, rather than
>a unicode string, and to fail if the unicode part can't be encoded
>appropriately.  The 'None' setting is effectively a nod to compatibility with
>prior 3.x versions, since I assume we can't just throw out the old coercion
>behavior.)
>
>Then, a few more changes to the bytes type would round out the implementation:
>
>* Allow .decode() to not specify an encoding, unless .encoding is None
>
>* Add back in the missing string methods (e.g. .encode()), since you can 
>transparently upgrade to a string)
>
>* Smart __str__, as shown in your proposal.

If my example above isn't nonsense, then __str__() would just return the
.string attribute.

>In short, +1.  (I wish it were possible to go back and make bytes non-strings
>and have only this ebytes or bstr or whatever type have string methods, but
>I'm pretty sure that ship has already sailed.)

Maybe it's PEP time?  No, I'm not volunteering. ;)

-Barry



signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 21, 2010, at 04:16 PM, P.J. Eby wrote:

>At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote:
>>On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote:
>>
>> >OTOH, one potential problem with having the encoding on the bytes object
>> >rather than the ebytes object is that then you can't easily take > bytes 
>> >from a
>> >socket and then say what encoding they are, without interfering with the
>> >sockets API (or whatever other place you get the bytes from).
>>
>>Unless the default was the "I don't know" marker and you were able to set it
>>after you've done whatever kind of application-level calculation you needed to
>>do.
>
>True, but making it a separate type with a required encoding gets rid of the 
>magical "I don't know" - the "I don't know" encoding is just a plain old bytes 
>object.
>
>(In principle, you could then drop *all* the stringlike methods from 
>plain-old-bytes objects.  If it's really text-in-bytes you want, you should 
>use an ebytes with the encoding specified.)

Yep, agreed!
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Antoine Pitrou

Le lundi 21 juin 2010 à 21:13 +0100, Michael Foord a écrit :
> 
> If OS X is a supported and important platform for Python then fixing all 
> problems that it reveals (or being willing to) should definitely not be 
> a pre-requisite of providing a buildbot (which is already a service to 
> the Python developer community). Fixing bugs / failures revealed by 
> Bill's buildbot is not fixing them "for Bill" it is fixing them for Python.

I didn't say it was a prerequisite. I was merely pointing out that when
platform-specific bugs appear, people using the specific platform should
be helping if they want to actually encourage the fixing of these bugs.

OS X is only "a supported and important platform" if we have dedicated
core developers diagnosing or even fixing issues for it (like we
obviously have for Windows and Linux). Otherwise, I don't think we have
any moral obligation to support it.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi

On Mon, Jun 21, 2010 at 02:46:57PM -0400, P.J. Eby wrote:
> At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:
> >Nick alluded to the The One Obvious Way as a change in architecture.
> >
> >Specifically: Decode all bytes to typed objects (str, images, audio,
> >structured objects) at input.  Do no manipulations on bytes ever
> >except decode and encode (both to text, and to special-purpose objects
> >such as images) in a program that does I/O.
> 
> This ignores the existence of use cases where what you have is text
> that can't be properly encoded in unicode.  I know, it's a hard thing
> to wrap one's head around, since on the surface it sounds like
> unicode is the programmer's savior.  Unfortunately, real-world text
> data exists which cannot be safely roundtripped to unicode, and must
> be handled in "bytes with encoding" form for certain operations.
> 
> I personally do not have to deal with this *particular* use case any
> more -- I haven't been at NTT/Verio for six years now.  But I do know
> it exists for e.g. Asian language email handling, which is where I
> first encountered it.  At the time (this *may* have changed), many
> popular email clients did not actually support unicode, so you
> couldn't necessarily just send off an email in UTF-8.  It drove us
> nuts on the project where this was involved (an i18n of an existing
> Python app), and I think we had to compromise a bit in some fashion
> (because we couldn't really avoid unicode roundtripping due to
> database issues), but the use case does actually exist.
> 
> My current needs are simpler, thank goodness.  ;-)  However, they
> *do* involve situations where I'm dealing with *other*
> encoding-restricted legacy systems, such as software for interfacing
> with the US Postal Service that only works with a restricted subset
> of latin1, while receiving mangled ASCII from an ecommerce provider,
> and storing things in what's effectively a latin-1 database.  Being
> able to easily assert what kind of bytes I've got would actually let
> me catch errors sooner, *if* those assertions were being checked when
> different kinds of strings or bytes were being combined.  i.e., at
> coercion time).
> 
While it's certainly possible that you have a grapheme that has no
corresponding unicode codepoint, it doesn't sound like this is the case
you're dealing with here.  You talk about "restricted subset of latin1"
but all of latin1's graphemes have unicode codepoints.  You also talk about
not being able to "send off an email in UTF-8" but UTF-8 is an encoding of
unicode, not unicode itself.  Similarly, the statement that some email
clients don't support unicode isn't very clear as to actual problem.  The
email client supports displaying graphemes using glyphs present on the
computer.  As long as the graphemes needed have a unicode codepoint, using
unicode inside of your application and then encoding to bytes on the way out
works fine.

Even in cases where there's no unicode codepoint for the grapheme that
you're receiving unicode gives you a way out.  It provides you a private use
area where you can map the graphemes to unused codepoints.  Your
application keeps a mapping from that codepoint to the particular byte
sequence that you want.  Then write you a codec that converts from unicode w/
these private codepoints into your particular encoding (and from bytes into
unicode).

-Toshio


pgp0riTqgpAbp.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread M.-A. Lemburg

Barry Warsaw wrote:
> On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote:
> 
>> I like the idea of having encoding information carried with the data.
>> I don't think that an ebytes type that can *optionally* have an encoding
>> attribute makes the situation less confusing, though.
> 
> Agreed.  I think the attribute should always be there, but there probably
> needs to be a magic value (perhaps None) that indicates and unknown, manual,
> garbage, error, broken encoding.
> 
> Examples: you read bytes off a socket and don't know what the encoding is; you
> concatenate two ebytes that have incompatible encodings.

Such extra information tends to be lost whenever you pass the
bytes data through a C level API or some other function that
doesn't know about the special nature of those objects, treating
them just like any bytes object.

It may sound nice in theory, but in practice it doesn't work out.

Besides, if you do know the encoding, you can easily carry the
data around in a Unicode str object.

The problem lies elsewhere: What to do with a piece of text for
which you don't know the encoding and how to combine that piece
of text with other pieces of text for which you do know the
encoding.

There are a few options at hand:

 * you keep working on the bytes data and only convert things
   to Unicode when needed and where the encoding is known

 * you decode the bytes data for which you don't have the encoding
   information into some special Unicode form (eg. using the
   surrogateescape error handler) and hope that when the time
   comes to encode the Unicode data back into bytes, the codec
   supports reversing the conversion

 * you manage the data as a list of Unicode str and
   bytes objects and don't even try to be clever about encodings
   of text without unknown encoding

It depends a lot on the use case, which of these options fits
best.

>> To me the biggest
>> problem with python-2.x's unicode/bytes handling was not that it threw
>> exceptions but that it didn't always throw exceptions.  You might test this
>> in python2::
>>t = u'cafe'
>>function(t)
>>
>> And say, ah my code works.  Then a user gives it this::
>>t = u'café'
>>function(t)
>>
>> And get a unicode error because the function only works with unicode in the
>> ascii range.
> 
> That's an excellent point.

Here's a little known fact: by changing the Python2 default
encoding to 'undefined' (yes, that's a real codec !), you can disable
all automatic string coercion in Python2.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 21 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2010-07-19: EuroPython 2010, Birmingham, UK27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Bill Janssen

Antoine Pitrou  wrote:

> Le lundi 21 juin 2010 à 12:57 -0700, Bill Janssen a écrit :
> >
> > > Apparently some of these buildbots belong to you. Why don't you step
> > > up and investigate?
> > 
> > The fact that I'm running some buildbots doesn't mean I have to fix the
> > problems that they reveal, I think.
> 
> You certainly don't have to. But please don't ask others to do it for
> you, *especially* if the failure can't be reproduced under anything else
> than OS X, and if no useful diagnosis is available.

I'm more concerned about doing it for *us*, rather than for *me*.  Yes,
an OS X machine would be required to poke at it, but I doubt I'm the
only one here with an OS X machine :-).  If I am, that's a problem, and
we as a community should do something about that.

I downloaded 2.7rc2 and built it on my Intel OS X 10.5.8 machine.  It
still fails the test_uuid test:

% make test
[...]
test_uuid
test test_uuid failed -- Traceback (most recent call last):
  File "/private/tmp/Python-2.7rc2/Lib/test/test_uuid.py", line 472, in 
testIssue8621
self.assertNotEqual(parent_value, child_value)
AssertionError: '8395a08e40454895be537a180539b7fb' == 
'8395a08e40454895be537a180539b7fb'

[...]

However, when I run it directly:

% ./python.exe -Wd -3 -E -tt ./Lib/test/regrtest.py -v test_uuid
== CPython 2.7rc2 (r27rc2:82137, Jun 21 2010, 12:50:22) [GCC 4.0.1 (Apple Inc. 
build 5493)]
==   Darwin-9.8.0-i386-32bit little-endian
==   /private/tmp/Python-2.7rc2/build/test_python_58012
test_uuid
testIssue8621 (test.test_uuid.TestUUID) ... ok
test_UUID (test.test_uuid.TestUUID) ... ok
test_exceptions (test.test_uuid.TestUUID) ... ok
test_getnode (test.test_uuid.TestUUID) ... ok
test_ifconfig_getnode (test.test_uuid.TestUUID) ... ok
test_ipconfig_getnode (test.test_uuid.TestUUID) ... ok
test_netbios_getnode (test.test_uuid.TestUUID) ... ok
test_random_getnode (test.test_uuid.TestUUID) ... ok
test_unixdll_getnode (test.test_uuid.TestUUID) ... ok
test_uuid1 (test.test_uuid.TestUUID) ... ok
test_uuid3 (test.test_uuid.TestUUID) ... ok
test_uuid4 (test.test_uuid.TestUUID) ... ok
test_uuid5 (test.test_uuid.TestUUID) ... ok
test_windll_getnode (test.test_uuid.TestUUID) ... ok

--
Ran 14 tests in 0.087s

OK
1 test OK.
%

So I don't know what to think.

The same thing happens with the py3kwarn test:

% ./python.exe -Wd -3 -E -tt ./Lib/test/regrtest.py -v test_py3kwarn
== CPython 2.7rc2 (r27rc2:82137, Jun 21 2010, 12:50:22) [GCC 4.0.1 (Apple Inc. 
build 5493)]
==   Darwin-9.8.0-i386-32bit little-endian
==   /private/tmp/Python-2.7rc2/build/test_python_58057
test_py3kwarn
test_backquote (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_buffer (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_builtin_function_or_method_comparisons 
(test.test_py3kwarn.TestPy3KWarnings) ... ok
test_cell_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_code_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_dict_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_file_xreadlines (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_forbidden_names (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_frame_attributes (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_hash_inheritance (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_methods_members (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_object_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_operator (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_paren_arg_names (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_slice_methods (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_softspace (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_sort_cmp_arg (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_sys_exc_clear (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_tuple_parameter_unpacking (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_type_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok
test_mutablestring_removal (test.test_py3kwarn.TestStdlibRemovals) ... ok
test_optional_module_removals (test.test_py3kwarn.TestStdlibRemovals) ... ok
test_os_path_walk (test.test_py3kwarn.TestStdlibRemovals) ... ok
test_platform_independent_removals (test.test_py3kwarn.TestStdlibRemovals) ... 
ok
test_platform_specific_removals (test.test_py3kwarn.TestStdlibRemovals) ... 
/private/tmp/Python-2.7rc2/Lib/plat-mac/findertools.py:303: SyntaxWarning: 
tuple parameter unpacking has been removed in 3.x
  def _setlocation(object_alias, (x, y)):
/private/tmp/Python-2.7rc2/Lib/plat-mac/findertools.py:445: SyntaxWarning: 
tuple parameter unpacking has been removed in 3.x
  def _setwindowsize(folder_alias, (w, h)):
/private/tmp/Python-2.7rc2/Lib/plat-mac/findertools.py:496: SyntaxWarning: 
tuple parameter unpacking has been removed in 3.x
  def _setwindowposition(folder_alias, (x, y)):
ok
test_reduce_move (test.test_py3kwarn.TestStdlibRemovals) ... ok

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Lennart Regebro

On Sun, Jun 20, 2010 at 02:02, Terry Reedy  wrote:
> After reading the discussion in the previous thread, signed in to #python
> and verified that the intro message starts with a lie about python3. I also
> verified that the official #python site links to "Python Commandment Don't
> use Python 3… yet".

Well, it *should* say: "If you need to ask if you should use Python 2
or Python 3, you probably are better off with Python 2 for the
moment". But that's a bit long. :-)

-- 
Lennart Regebro: http://regebro.wordpress.com/
Python 3 Porting: http://python3porting.com/
+33 661 58 14 64
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Lennart Regebro

On Sun, Jun 20, 2010 at 18:20, Laurens Van Houtven  wrote:
> 2.x or 3.x? http://tinyurl.com/py2or3

Wow. That's almost not an improvement... That link doesn't really help
anyone choose at all.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Martin v. Löwis


If OS X is a supported and important platform for Python then fixing all
problems that it reveals (or being willing to) should definitely not be
a pre-requisite of providing a buildbot (which is already a service to
the Python developer community). Fixing bugs / failures revealed by
Bill's buildbot is not fixing them "for Bill" it is fixing them for Python.


I wish people would stop using the word "supported" when they talk about 
free software. *No* system is "supported" by Python - not even in the 
sense "we strive to pass the test suite". "We" don't.


Now, one may argue whether failing buildbots should be an unconditional 
reason to defer the release. I personally would say "no", despite what 
some PEP may say. People proposing that a release is postponed typically 
hope that somebody gets frustrated enough to step up and fix the bug, 
just so that the software gets released.


Instead, I would propose that the only way to delay a release is by 
proposing to take some specific action to remedy the situation that 
should cause the delay. Otherwise, releasing is at the discretion of the 
release manager, who has the ultimate say to whether the problem is 
important or not.


As for OSX, it seems that the only test that is failing is the ctypes 
test suite, and there only a single test. I don't think this is 
sufficient reason to block the release.


Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Bill Janssen

Antoine Pitrou  wrote:

> OS X is only "a supported and important platform" if we have dedicated
> core developers diagnosing or even fixing issues for it (like we
> obviously have for Windows and Linux). Otherwise, I don't think we have
> any moral obligation to support it.

Fair enough.

That being said, there are two classes of OS X issues.  The first is the
kind of thing that Ronald Oussoren and Ned Deily keep fixing for us,
which require a knowledge of OS X frameworks and SDKs and various other
deeply-Apple oddnesses.  But the second class is a set of UNIX issues,
where OS X is just a variant of UNIX with minor differences from other
UNIX platforms.

It looks to me as if we don't really need Apple geeks for the second
class of issues, we just need developers who have a Mac to test on.

It looks to me, for instance, as if the failures in test_py3kwarn and
test_uuid on Leopard are bugs in the Python testing framework that
happen to be exercised on OS X, rather than bugs caused in some way by
the platform.  There, the requisite knowledge is, how does regrtest.py
really work?

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Martin v. Löwis


Am 21.06.2010 21:45, schrieb Michael Foord:

On 21/06/2010 20:30, Benjamin Peterson wrote:

2010/6/21 Bill Janssen:

They are at the end of the buildbot list, so off-screen if you are using
a normal browser. You have to scroll to see them.

But not on the "stable" view and that's the only one I look at.



What are the requirements for moving the OS X buildbots into the stable
view? Are the builders themselves stable enough? (If the requirement is
that the buildbots be green then it is something of a catch-22.)


It is indeed the latter (at least, how I understand it). The builder 
should "usually" give green, which means it should have done so over 
some extended period of time. If it then gets broken it means that 
somebody actually broke the code, rather than the system showing one of 
its glitches.


So asking for addition to the stable list *while* the slave is red is a 
bad idea.


FWIW, nobody has requested changing any of the build slaves to "stable" 
for the last two years or so.


Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Michael Foord


On 21/06/2010 22:12, "Martin v. Löwis" wrote:

If OS X is a supported and important platform for Python then fixing all
problems that it reveals (or being willing to) should definitely not be
a pre-requisite of providing a buildbot (which is already a service to
the Python developer community). Fixing bugs / failures revealed by
Bill's buildbot is not fixing them "for Bill" it is fixing them for 
Python.


I wish people would stop using the word "supported" when they talk 
about free software. *No* system is "supported" by Python - not even 
in the sense "we strive to pass the test suite". "We" don't.




Well, for better or for worse I think "we" do. We certainly *strive* to 
support these platforms and having the buildbots is a big part of this.


Now, one may argue whether failing buildbots should be an 
unconditional reason to defer the release. I personally would say 
"no", despite what some PEP may say. People proposing that a release 
is postponed typically hope that somebody gets frustrated enough to 
step up and fix the bug, just so that the software gets released.


Instead, I would propose that the only way to delay a release is by 
proposing to take some specific action to remedy the situation that 
should cause the delay. Otherwise, releasing is at the discretion of 
the release manager, who has the ultimate say to whether the problem 
is important or not.




I would agree with leaving it to the discretion of the release manager 
and we should aim for rather than hard require all stable buildbots to 
be green. I would still *expect* that a release manager would look at 
the stable buildbots before cutting a release.



As for OSX, it seems that the only test that is failing is the ctypes 
test suite, and there only a single test. I don't think this is 
sufficient reason to block the release.


Bill listed several other failures he saw on the buildbots and I see the 
same set, plus test_posix.


All the best,

Michael


Regards,
Martin



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Simon de Vlieger


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21 jun 2010, at 23:03, Lennart Regebro wrote:

On Sun, Jun 20, 2010 at 18:20, Laurens Van Houtven  
 wrote:

2.x or 3.x? http://tinyurl.com/py2or3


Wow. That's almost not an improvement... That link doesn't really help
anyone choose at all.


Lennart,

That part of the topic will be replaced after all feedback is gathered  
on the new article Laurens provided at: http://python-commandments.org/python3.html 
 as stated earlier in this thread.


Regards,

Simon de Vlieger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iQIcBAEBAgAGBQJMH9j1AAoJEBBSHP7i+JXf8qQP/1w6Esl/x6S5+4lDqykx0R7w
M9v6x8G2JvnthTkzh2hF76vruLc4e3SNs1QVCmirh5vjdkRHneJQ/2w/dRVKLi2b
/tayYg5QyzjPL37wiAarRnsr7SSiwFgEUCHWZVAAw0dRvszYF/CoLmxTs8TQWs8o
KnRuwO4UHuXvtarqO8JeY6gMR4bwcdEXHVNqdRK+PSoRXH9IVJky6IcqwtTC0bzf
vyLlQZmVdiXIXvjYOxNQgoufmsC74daqqodzhxtCn2WTHSN2s1ws/gkxBqe+NZPz
zYlAukVSiLz/YMcK3NGZYukseT8ZBGiNMuhPVt3lb4SY2LnKVRUiYqNCp9wpWCr/
ASmjaZDU0Dz5I+PHSNCWC4NHyTNClPy3b4b9y3LJ/6hpNZaC3wGHTX5IDxQKjt5u
ajEgzstM2wuZDtVNQhcADHk2KWBsCoaE9c0tXKz40T7nIq15zbbGqhyTXjmyouLB
JoonSPbS5Ap1UY6RGWEt6t3ZdVDDnMwJzL/DBMOiMgWZIVf7B6/VPy0j9jV9U0WV
Sx+U5WnaYqKYo+ZkRTg1iI6dPuK5GTGph+2gzjdTHRVMFFPETxkFz/pBZJG4DOHq
bkaKG2IFMWB+Ua9GrTJTbfmTP3YzgJwBG34ZWRLFSQu7zJaY1JdQqQK7z+SCJ5Lg
toMEpj7z8KxfUAF84xBG
=hTod
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread James Y Knight


On Jun 21, 2010, at 4:29 PM, M.-A. Lemburg wrote:

Here's a little known fact: by changing the Python2 default
encoding to 'undefined' (yes, that's a real codec !), you can disable
all automatic string coercion in Python2.


I tried that once: half the stdlib stops working if you do (for  
example, the re module), so it's not particularly useful for checking  
if your own code is unicode-safe.


James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Lennart Regebro

On Mon, Jun 21, 2010 at 23:26, Simon de Vlieger  wrote:
> That part of the topic will be replaced after all feedback is gathered on
> the new article Laurens provided at:
> http://python-commandments.org/python3.html as stated earlier in this
> thread.

OK, great, I missed that!

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Martin v. Löwis


Bill listed several other failures he saw on the buildbots and I see the
same set, plus test_posix.


Still, the question would be whether any of these failures can manage to 
block a release. Are they regressions from 2.6? That would make them 
good candidates for release blockers. Except that I still would like to 
see commitment from somebody to fix them or else they can't block the 
release: if "we" don't mean that supporting a platform also means 
volunteering to fix bugs, then I guess "we" should stop declaring the
platform supported. Just wishing that it was supported actually doesn't 
make it so.


If the test failure *isn't* a regression, I think it shouldn't block the 
release.


Regards,
Martin




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi

On Mon, Jun 21, 2010 at 04:09:52PM -0400, P.J. Eby wrote:
> At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
> >On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote:
> >> At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
> >> >What do you think of making the encoding attribute a mandatory part of
> >> >creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).
> >>
> >> As long as the coercion rules force str+ebytes (or str % ebytes,
> >> ebytes % str, etc.) to result in another ebytes (and fail if the str
> >> can't be encoded in the ebytes' encoding), I'm personally fine with
> >> it, although I really like the idea of tacking the encoding to bytes
> >> objects in the first place.
> >>
> >I wouldn't like this.  It brings us back to the python2 problem where
> >sometimes you pass an ebyte into a function and it works and other times you
> >pass an ebyte into the function and it issues a traceback.
> 
> For stdlib functions, this isn't going to happen unless your ebytes'
> encoding is not compatible with the ascii subset of unicode, or the
> stdlib function is working with dynamic data...  in which case you
> really *do* want to fail early!
> 
The ebytes encoding will often be incompatible with the ascii subset.
It's the reason that people were so often tempted to change the
defaultencoding on python2 to utf8.

> I don't see this as a repeat of the 2.x situation; rather, it allows
> you to cause errors to happen much *earlier* than they would
> otherwise show up if you were using unicode for your encoded-bytes
> data.
> 
> For example, if your program's intent is to end up with latin-1
> output, then it would be better for an error to show up at the very
> *first* point where non-latin1 characters are mixed with your data,
> rather than only showing up at the output boundary!
> 
That highly depends on your usage.  If you're formatting a comment on a web
page, checking at output and replacing with '?' is better than a traceback.
If you're entering key values into a database, then you likely want to know
where the non-latin1 data is entering your program, not where it's mixed
with your data or the output boundary.

> However, if you promoted mixed-type operation results to unicode
> instead of ebytes, then you:
> 
> 1) can't preserve data that doesn't have a 1:1 mapping to unicode, and
> 
ebytes should be immutable like bytes and str.  So you shouldn't lose the
data if you keep a reference to it.

> 2) can't detect an error until your data reaches the output point in
> your application -- forcing you to defensively insert ebytes calls
> everywhere (vs. simply wrapping them around a handful of designated
> inputs), or else have to go right back to tracing down where the
> unusable data showed up in the first place.
> 
Usually, you don't want to know where you are combining two incompatible
strings.  Instead, you want to know where the incompatible strings are being
set in the first place.  If function(a, b) tracebacks with certain
combinations of a and b I need to know where a and b are being set, not
where function(a, b) is in the source code.  So you need to be making input
values ebytes() (or str in current python3) no matter what.

> One thing that seems like a bit of a blind spot for some folks is
> that having unicode is *not* everybody's goal.  Not because we don't
> believe unicode is generally a good thing or anything like that, but
> because we have to work with systems that flat out don't *do*
> unicode, thereby making the presence of (fully-general) unicode an
> error condition that has to be stamped out!
> 
I think that sometimes as well.  However, here I think you're in a bit of
a blind spot yourself.  I'm saying that making ebytes + str coerce to ebytes
will only yield a traceback some of the time; which is the python2
behaviour.  Having ebytes + str coerce to str will never throw a traceback
as long as our implementation checks that the bytes and encoding work
together fro mthe start.

Throwing an error in code, only on some input is one of the main reasons
that debugging unicode vs byte issues sucks on python2.  On my box, with my
dataset, everything works.  Toss it up on pypi and suddenly I have a user in
Japan who reports that he gets a traceback with his dataset that he can't
give to me because it's proprietary, overly large, or transient.



> IOW, if you're producing output that has to go into another system
> that doesn't take unicode, it doesn't matter how
> theoretically-correct it would be for your app to process the data in
> unicode form.  In that case, unicode is not a feature: it's a bug.
> 
This is not always true.  If you read a webpage, chop it up so you get
a list of words, create a histogram of word length, and then write the output as
utf8 to a database.  Should you do all your intermediate string operations
on utf8 encoded byte strings?  No, you should do them on unicode strings as
otherwise you need to know about the details of how utf8 encodes characters.

>

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Laurens Van Houtven

On Mon, Jun 21, 2010 at 11:03 PM, Lennart Regebro  wrote:
> On Sun, Jun 20, 2010 at 18:20, Laurens Van Houtven  wrote:
>> 2.x or 3.x? http://tinyurl.com/py2or3
>
> Wow. That's almost not an improvement... That link doesn't really help
> anyone choose at all.
>
> --
> Lennart Regebro: Python, Zope, Plone, Grok
> http://regebro.wordpress.com/
> +33 661 58 14 64
>

Please read the rest of the thread: that's ancient information and no
longer the latest work. We just removed the thing that offended
people, so that the situation could be defused instantly and then we
could work towards something everyone liked in a calm and productive
environment.

Laurens
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread John Arbash Meinel


...
>> IOW, if you're producing output that has to go into another system
>> that doesn't take unicode, it doesn't matter how
>> theoretically-correct it would be for your app to process the data in
>> unicode form.  In that case, unicode is not a feature: it's a bug.
>>
> This is not always true.  If you read a webpage, chop it up so you get
> a list of words, create a histogram of word length, and then write the output 
> as
> utf8 to a database.  Should you do all your intermediate string operations
> on utf8 encoded byte strings?  No, you should do them on unicode strings as
> otherwise you need to know about the details of how utf8 encodes characters.
> 

You'd still have problems in Unicode given stuff like å =~ å even though
u'\xe5' vs u'a\u030a' (those will look the same depending on your
Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw
with my current font shows the second as 2 characters.)

I realize this was a toy example, but it does point out that Unicode
complicates the idea of 'equality' as well as the idea of 'what is a
character'. And just saying "decode it to Unicode" isn't really sufficient.

John
=:->

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Michael Foord


On 21/06/2010 22:36, "Martin v. Löwis" wrote:

Bill listed several other failures he saw on the buildbots and I see the
same set, plus test_posix.


Still, the question would be whether any of these failures can manage 
to block a release. Are they regressions from 2.6?


The test_posix failure is a regression from 2.6 (but it only shows up on 
some machines - it is caused by a fairly braindead implementation of a 
couple of posix apis by Apple apparently).


http://bugs.python.org/issue7900

There are various patches available and a lot of work that has gone into 
diagnosing it - but there was some disagreement on what is the *best* 
way to fix it.


Two of the other failures I'm pretty sure are problems in the test suite 
rather than bugs (as Bill said) and I'm not sure about the ctypes issue. 
Just starting a full build here.


Michael
That would make them good candidates for release blockers. Except that 
I still would like to see commitment from somebody to fix them or else 
they can't block the release: if "we" don't mean that supporting a 
platform also means volunteering to fix bugs, then I guess "we" should 
stop declaring the
platform supported. Just wishing that it was supported actually 
doesn't make it so.


If the test failure *isn't* a regression, I think it shouldn't block 
the release.


Regards,
Martin






--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Michael Foord


On 21/06/2010 22:52, Michael Foord wrote:

On 21/06/2010 22:36, "Martin v. Löwis" wrote:
Bill listed several other failures he saw on the buildbots and I see 
the

same set, plus test_posix.


Still, the question would be whether any of these failures can manage 
to block a release. Are they regressions from 2.6?


The test_posix failure is a regression from 2.6 (but it only shows up 
on some machines - it is caused by a fairly braindead implementation 
of a couple of posix apis by Apple apparently).


http://bugs.python.org/issue7900

There are various patches available and a lot of work that has gone 
into diagnosing it - but there was some disagreement on what is the 
*best* way to fix it.


Two of the other failures I'm pretty sure are problems in the test 
suite rather than bugs (as Bill said) and I'm not sure about the 
ctypes issue. Just starting a full build here.


Right now I'm *only* seeing these two failures on Mac OS X (10.6.4):

test_posix test_urllib2_localnet

All the best,

Michael



Michael
That would make them good candidates for release blockers. Except 
that I still would like to see commitment from somebody to fix them 
or else they can't block the release: if "we" don't mean that 
supporting a platform also means volunteering to fix bugs, then I 
guess "we" should stop declaring the
platform supported. Just wishing that it was supported actually 
doesn't make it so.


If the test failure *isn't* a regression, I think it shouldn't block 
the release.


Regards,
Martin









--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Nick Coghlan

On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby  wrote:
> True, but making it a separate type with a required encoding gets rid of the
> magical "I don't know" - the "I don't know" encoding is just a plain old
> bytes object.

So, to boil down the ebytes idea, it is basically a request for a
second string type that holds an octet stream plus an encoding name,
rather than a Unicode character stream. Calling it "ebytes" seems to
emphasise the wrong parallel in that case (you have a 'str' object
with a different internal structure, not any kind of bytes object).
For now I'll call it an "altstr". Then the idea can be described as

- altstr would expose the same API as str, NOT the same API as bytes
- explicit conversion via "str" would use the altstr's __str__ method
- explicit conversion via "bytes" would use the altstr's __bytes__ method
- implicit interaction with str would convert the str to an altstr
object according to the altstr's rules. This may be best handled via a
coercion method on altstr, rather than str actually needing to know
the details (i.e. an altrstr.__coerce_str__() method). For the
'ebytes' model, this would do something like
"type(self)(other.encode(self.encoding), self.encoding))". The
operation would then be handled by the corresponding method on the
coerced object. A new type could then override operations such as
__contains__, __mod__, format() and join().

This is still smelling an awful lot like the 2.x str type to me, but
supporting a __coerce_str__ method may allow some useful
experimentation in this space (as PJE suggested). There's a chance it
would be abused, but it offers a greater chance of success than trying
to come up with a concrete altstr type without providing a means for
experimentation first.

> (In principle, you could then drop *all* the stringlike methods from
> plain-old-bytes objects.  If it's really text-in-bytes you want, you should
> use an ebytes with the encoding specified.)

Except that a lot of those string-like methods are just plain useful,
even when you *know* you're dealing with an octet stream rather than
latin-1 encoded text.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi

On Mon, Jun 21, 2010 at 04:52:08PM -0500, John Arbash Meinel wrote:
> 
> ...
> >> IOW, if you're producing output that has to go into another system
> >> that doesn't take unicode, it doesn't matter how
> >> theoretically-correct it would be for your app to process the data in
> >> unicode form.  In that case, unicode is not a feature: it's a bug.
> >>
> > This is not always true.  If you read a webpage, chop it up so you get
> > a list of words, create a histogram of word length, and then write the 
> > output as
> > utf8 to a database.  Should you do all your intermediate string operations
> > on utf8 encoded byte strings?  No, you should do them on unicode strings as
> > otherwise you need to know about the details of how utf8 encodes characters.
> > 
> 
> You'd still have problems in Unicode given stuff like å =~ å even though
> u'\xe5' vs u'a\u030a' (those will look the same depending on your
> Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw
> with my current font shows the second as 2 characters.)
> 
> I realize this was a toy example, but it does point out that Unicode
> complicates the idea of 'equality' as well as the idea of 'what is a
> character'. And just saying "decode it to Unicode" isn't really sufficient.
> 
Ah -- but if you're dealing with unicode objects you can use the
unicodedata.normalize() function on them to come out with the right values.
If you're using bytes, it's yet another case where you, the programmer, have
to know what byte sequences represent combining characters in the particular
encoding that you're dealing with.

-Toshio


pgpF7cCCZvokU.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Martin v. Löwis


The test_posix failure is a regression from 2.6 (but it only shows up on
some machines - it is caused by a fairly braindead implementation of a
couple of posix apis by Apple apparently).

http://bugs.python.org/issue7900


Ah, that one. I definitely think this should *not* block the release:
a) there is no clear solution in sight. So if we wait for it resolved,
   it could take months until we get a 2.7 release.
b) it's only about getgroups - a fairly minor API.
c) IIUC, it only occurs to users which are member of more than 16
   groups - a fairly uncommon setup.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Nick Coghlan

> There also seem to be a couple of failures left with test_gdb...

Do you mean the compiler and debugger specific issues reported in
http://bugs.python.org/issue8482?

Fixing that properly is messy, and according to Victor's last message,
even the correct conditions for skipping the test aren't completely
clear.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Paul Moore

On 21 June 2010 22:57, Michael Foord  wrote:
>> Two of the other failures I'm pretty sure are problems in the test suite
>> rather than bugs (as Bill said) and I'm not sure about the ctypes issue.
>> Just starting a full build here.
>
> Right now I'm *only* seeing these two failures on Mac OS X (10.6.4):
>
>    test_posix test_urllib2_localnet

I'm still seeing a test_ctypes failure (on Windows XP). Not sure if
it's the same one Bill was seeing:

FAIL: test_issue_8959_b (ctypes.test.test_callbacks.SampleCallbacksTestCase)
--
Traceback (most recent call last):
  File 
"C:\buildslave\trunk.moore-windows\build\lib\ctypes\test\test_callbacks.py",
line 208, in test_issue_8959_b
self.assertFalse(windowCount == 0)
AssertionError: True is not False

Looks like this test was added today, and counts the windows. As my
buildbot is running as a service, and I generally leave it running
when logged off, a window count of 0 may well be correct - I can't be
sure. So my view is that it's possibly a bug in the test - but it
could do with someone more expert to confirm this.

I've got a build running at the moment, when it's finished I'll rerun
the trunk build (I currently have a disconnected session with a window
open, I'll see if that makes it pass).

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Paul Moore

On 21 June 2010 23:19, Paul Moore  wrote:
> On 21 June 2010 22:57, Michael Foord  wrote:
>>> Two of the other failures I'm pretty sure are problems in the test suite
>>> rather than bugs (as Bill said) and I'm not sure about the ctypes issue.
>>> Just starting a full build here.
>>
>> Right now I'm *only* seeing these two failures on Mac OS X (10.6.4):
>>
>>    test_posix test_urllib2_localnet
>
> I'm still seeing a test_ctypes failure (on Windows XP). Not sure if
> it's the same one Bill was seeing:
>
> FAIL: test_issue_8959_b (ctypes.test.test_callbacks.SampleCallbacksTestCase)
> --
> Traceback (most recent call last):
>  File 
> "C:\buildslave\trunk.moore-windows\build\lib\ctypes\test\test_callbacks.py",
> line 208, in test_issue_8959_b
>    self.assertFalse(windowCount == 0)
> AssertionError: True is not False
>
> Looks like this test was added today, and counts the windows. As my
> buildbot is running as a service, and I generally leave it running
> when logged off, a window count of 0 may well be correct - I can't be
> sure. So my view is that it's possibly a bug in the test - but it
> could do with someone more expert to confirm this.
>
> I've got a build running at the moment, when it's finished I'll rerun
> the trunk build (I currently have a disconnected session with a window
> open, I'll see if that makes it pass).

Yes, looks like it's a bug in the test. http://bugs.python.org/issue9055 raised.

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] red buildbots on 2.7

2010-06-21 Thread Stefan Krah

Bill Janssen  wrote:
> % make test
> [...]
> test_uuid
> test test_uuid failed -- Traceback (most recent call last):
>   File "/private/tmp/Python-2.7rc2/Lib/test/test_uuid.py", line 472, in 
> testIssue8621
> self.assertNotEqual(parent_value, child_value)
> AssertionError: '8395a08e40454895be537a180539b7fb' == 
> '8395a08e40454895be537a180539b7fb'
> 
> [...]

I reopened http://bugs.python.org/issue8621 . Could you comment there
and help resolve the test failure?


Stefan Krah


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding additional level of bookmarks and section numbers in python pdf documents.

2010-06-21 Thread Terry Reedy


On 6/21/2010 4:07 PM, Peng Yu wrote:

Hi,

Current pdf version of python documents don't have bookmarks for
sussubsection. For example, there is no bookmark for the following
section in python_2.6.5_reference.pdf. Also the bookmarks don't have
section numbers in them. I suggest to include the section numbers.
Could these features be added in future release of python document.

3.4.1 Basic customization


Search doc issues on the tracker for this topic and file a feature 
request doc issue if there is not one.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Terry Reedy


On 6/21/2010 3:59 PM, Steve Holden wrote:

Terry Reedy wrote:

On 6/21/2010 8:33 AM, Nick Coghlan wrote:


P.S. (We're going to have a tough decision to make somewhere along the
line where docs.python.org is concerned, too - when do we flick the
switch and make a 3.x version of the docs the default?


Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'.
Trunk released always take over docs.python.org. To do otherwise would
be to say that 3.2 is not a real trunk release and not yet ready for
real use -- a major slam.

Actually, I thought this was already discussed and decided ;-).


This also gives the 2.7 release it's day in the sun before relegation to
maintenance status.


Every new version (except 3.0 and 3.1) has gone to maintenance status 
*and* becomes the featured release on docs.python.org the day it was 
released.  2.7 would just spend less time as the featured release on 
that page.



The Python 3 documents, when they become the default, should contain an
every-page link to the Python 2 documentation (though linkages may be a
problem - they could probably be done at a gross level).


docs.python.org contains links to docs to other releases, both past and 
future. There is no reason to treat 3.2 specially, or to junk up its 
pages. The 3.x docs have intentionally been cleaned of nearly all 
references to 2.x. The current 2.6 and 2.7 pages have no references to 
corresponding 3.1 pages.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw

On Jun 22, 2010, at 08:03 AM, Nick Coghlan wrote:

>On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby  wrote:
>> True, but making it a separate type with a required encoding gets rid of the
>> magical "I don't know" - the "I don't know" encoding is just a plain old
>> bytes object.
>
>So, to boil down the ebytes idea, it is basically a request for a
>second string type that holds an octet stream plus an encoding name,
>rather than a Unicode character stream. Calling it "ebytes" seems to
>emphasise the wrong parallel in that case (you have a 'str' object
>with a different internal structure, not any kind of bytes object).
>For now I'll call it an "altstr". Then the idea can be described as

Actually no.  We're introducing a second bytes type that holds an octet stream
plus an encoding name.  See the toy implementation I included in a previous
message.

As opposed to say a bytes object that represented an image, which would make
almost no sense to decode to a unicode, this ebytes type would help bridge the
gap between a pure bytes object and a pure unicode object.  It would know how
to accurately convert to a unicode (i.e. __str__()) because it would know the
encoding of the bytes.  Obviously, it could convert to a pure bytes object.
Because it can be accurately stringified, it can have the most if not all of
the str API.

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

1 2 >

1 - 100 of 120 matches

Mail list logo