Re: [Python-Dev] Inclusive Range

2010-10-04 Thread Xavier Morel
On 2010-10-04, at 05:04 , Eviatar Bach wrote:
 Hello,
 
 I have a proposal of making the range() function inclusive; that is,
 range(3) would generate 0, 1, 2, and 3, as opposed to 0, 1, and 2. Not only
 is it more intuitive, it also seems to be used often, with coders often
 writing range(0, example+1) to get the intended result. It would be easy to
 implement, and though significant, is not any more drastic than changing
 print to a function in Python 3. Of course, if this were done, slicing
 behaviour would have to be adjusted accordingly.
 
 What are your thoughts?

Same as the others:
0. This is a discussion for python-ideas, I'm CCing that list
1. This is a major backwards compatibility breakage, and one which is entirely 
silent (`print` from keyword to function wasn't)
2. It loses not only well-known behavior but interesting properties as well 
(`range(n)` has exactly `n` elements. With your proposal, it has ``n+1`` 
breaking ``for i in range(5)`` to iterate 5 times as well as ``for i in 
range(len(collection))`` for cases where e.g. ``enumerate`` is not good enough 
or too slow)
3. As well as the relation between range and slices
4. I fail to see how it is more intuitive (let alone more practical, see 
previous points)
5. If you want an inclusive range, I'd recommend proposing a flag on `range` 
(e.g. ``inclusive=True``) rather than such a drastic breakage of ``range``'s 
behavior. That, at least, might have a chance. Changing the existing default 
behavior of range most definitely doesn't.

I'd be −1 on your proposal, −0 on adding a flag to ``range`` (I can't recall 
the half-open ``range`` having bothered me recently, if ever)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Scott Dial
On 10/2/2010 7:00 PM, R. David Murray wrote:
 The clever hack (thanks ultimately to Martin) is to accept 8bit data
 by encoding it using the ASCII codec and the surrogateescape error
 handler.

I've seen this idea pop up in a number of threads. I worry that you are
all inventing a new kind of dual that is a direct parallel to Python 2.x
strings. That is to say,

3.x b = b'\xc2\xa1'
3.x s = b.decode('utf8')
3.x v = b.decode('ascii', 'surrogateescape')

, where s and v should be the same thing in 3.x but they are not due
to an encoding trick. I believe this trick generates more-or-less the
same issues as strings did in 2.x:

2.x b = '\xc2\xa1'
2.x s = b.decode('utf8')
2.x v = b

Any reasonable 2.x code has to guard on str/unicode and it would seem in
3.x, if this idiom spreads, reasonable code will have to guard on
surrogate escapes (which actually seems like a more expensive test). As in,

3.x print(v)
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in
position 0: surrogates not allowed

It seems like this hack is about making the 3.x unicode type more like
the 2.x string type, and I thought we decided that was a bad idea. How
will developers not have to ask themselves whether a given string is a
real string or a byte sequence masquerading as a string? Am I missing
something here?

-- 
Scott Dial
sc...@scottdial.com
scod...@cs.indiana.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread R. David Murray
On Mon, 04 Oct 2010 12:32:26 -0400, Scott Dial scott+python-...@scottdial.com 
wrote:
 On 10/2/2010 7:00 PM, R. David Murray wrote:
  The clever hack (thanks ultimately to Martin) is to accept 8bit data
  by encoding it using the ASCII codec and the surrogateescape error
  handler.
 
 I've seen this idea pop up in a number of threads. I worry that you are
 all inventing a new kind of dual that is a direct parallel to Python 2.x
 strings.

Yes, that is exactly my worry.

 That is to say,
 
 3.x b = b'\xc2\xa1'
 3.x s = b.decode('utf8')
 3.x v = b.decode('ascii', 'surrogateescape')
 
 , where s and v should be the same thing in 3.x but they are not due
 to an encoding trick.

Why should they be the same thing in 3.x?  One is an ASCII string with
some escaped bytes in an unknown encoding, the other is a valid unicode
string.  The surrogateescape trick is used only when we don't *know*
the encoding (a priori) of the bytes in question.

 I believe this trick generates more-or-less the same issues as strings
 did in 2.x:
 
 2.x b = '\xc2\xa1'
 2.x s = b.decode('utf8')
 2.x v = b

The difference is that in 2.x people could and would operate on strings as
if they knew the encoding, and get in trouble.  In 3.x you can't do that.
If you've got escaped bytes you *know* that you don't know the encoding,
and the program can't get around that except by re-encoding to bytes
and properly decoding them.

 Any reasonable 2.x code has to guard on str/unicode and it would seem in
 3.x, if this idiom spreads, reasonable code will have to guard on
 surrogate escapes (which actually seems like a more expensive test). As in,
 
 3.x print(v)
 Traceback (most recent call last):
   File stdin, line 1, in module
 UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in
 position 0: surrogates not allowed

Right, I mentioned that concern in my post.

In this case at least, however, the *goal* is that the surrogates are
never seen outside the email internals.  In reflection of this, my latest
thought is that I should add a 'message_from_binary_file' helper method
and a 'feedbytes' method to feedparser, making the surrogates a 100%
internal implementation detail[*].  Only if the email package contains a
coding error would the surrogates escape and cause problems for user
code.

 It seems like this hack is about making the 3.x unicode type more like
 the 2.x string type, and I thought we decided that was a bad idea. How
 will developers not have to ask themselves whether a given string is a
 real string or a byte sequence masquerading as a string? Am I missing
 something here?

I think this question is something that needs to be considered any
time using surrogates is proposed.  I hope that in the email package
proposal I've addressed it.  What do you think?

--David

[*] And you are right that there is a performance concern as a result
of needing to detect surrogates at various points in the code.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Barry Warsaw
On Oct 02, 2010, at 07:00 PM, R. David Murray wrote:

The advantage of this patch is that it means Python3.2 can have an
email module that is capable of handling a significant proportion of
the applications where the ability to process binary email data is
required.

Like others, I'm concerned that we're perpetuating the Python 2 problems with
bytes vs. strings.  OTOH, I went down a similar road (though much more hacky
and less successful) in one of my failed branches, so I sympathize with this
nod to practicality that actually works.

If the choice is the current brokenness staying in Python 3.2 or this hack
being added for now, I'd go with the latter.  email6 will make it all better,
right? :)  In the meantime, I do think it would be good to give our users
something that's practical.

I've uploaded the patch to issue 4661 (http://bugs.python.org/issue4661). I
uploaded it to rietveld as well just before Martin's announcement. After the
announcement I uploaded the svn patch to the tracker, so hopefully there will
be an automated review button as well.  Here is your chance to exercise the
new review tools :)

I see no automatically generated link to the review, but I did add some
comments to the Rietveld issue you linked to in one of your comments.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Barry Warsaw
On Oct 02, 2010, at 10:36 AM, Georg Brandl wrote:

Am 02.10.2010 00:06, schrieb Barry Warsaw:

 The reason is that the import.c logic that uses the struct filedescr
 tables built from _PyImport_DynLoadFiletab are just not smart enough
 to handle this case.  All it knows about are suffix, and for
 backwards compatibility, we have dynload_shlib.c matching
 both .SOABI.so *and* bare .so.  So when it's searching the
 directories for .cpython-32m.so files, it finds
 the ..cpython-32dmu.so first based on the bare .so match.

I don't understand -- wouldn't foo.sometag.so (where sometag is not
SOABI) only be found a match for a suffix of .so if the module name
requested is foo.sometag?  (And if not, isn't implementing that the
easiest solution?)

Yep, my analysis was faulty.  Python's own import machinery does exactly the
right thing.  I think the problem is in distribute, which writes a _foo.py
file that bootstraps into loading the wrong .so file.  E.g. for an extension
names _stupid, you end up with this _stupid.py in the egg:

def __bootstrap__():
   global __bootstrap__, __loader__, __file__
   import sys, pkg_resources, imp
   __file__ = 
pkg_resources.resource_filename(__name__,'_stupid.cpython-32dmu.so')
   __loader__ = None; del __bootstrap__, __loader__
   imp.load_dynamic(__name__,__file__)
__bootstrap__()

Python's built-in import finds _stupid.py, but this is hardcoded to the
build-flags of the last Python used to install the package.  If instead this
looked like:

def __bootstrap__():
   global __bootstrap__, __loader__, __file__
   import sys, pkg_resources, imp, sysconfig
   __file__ = 
pkg_resources.resource_filename(__name__,'_stupid.{soabi}.so'.format(soabi=sysconfig.get_config_var('SOABI')))
   __loader__ = None; del __bootstrap__, __loader__
   imp.load_dynamic(__name__,__file__)
__bootstrap__()

then everything works out just fine.  If you install only the 'dmu' version
and try to import it with the 'm' Python, you get an ImportError as expected
(i.e. not a crash!).

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Barry Warsaw
On Oct 02, 2010, at 04:50 PM, Nick Coghlan wrote:

On Sat, Oct 2, 2010 at 10:24 AM, Antoine Pitrou solip...@pitrou.net
wrote:
 On Fri, 1 Oct 2010 20:06:57 -0400
 Barry Warsaw ba...@python.org wrote:

 With my branch, you'll end up with this in /tmp/python:

     bin/python3.2m   - the normal build binary
     bin/python3.2dmu - the wide+pydebug build binary
     bin/python3.2m-config
     bin/python3.2dmu-config

 Do users really want to see such idiosyncratic suffixes?

Ordinary users won't be building Python from source. Developers won't
care so long as we clearly document the sundry suffixes and describe
them in the README (or in a PEP, with a pointer from the README).

I agree with this.

     lib/libpython3.2.so.1.0.m
     lib/libpython3.2.so.1.0.dmum

 Ditto here. This seems to break well-known conventions.
 If I look at /usr/lib{,64} on my machine, I can't see a single
 shared libary file that ends neither in .so nor .so.some
 digits.

Having some characters on the end to flag different kinds of custom
build seems like it fits within the .so naming conventions I'm aware
of, but I'm sure the *nix packaging folks will pipe up if Barry starts
wandering too far afield in this area.

Because -Wl,-h is used, the right soname will get compiled in, so it will
generally just DTRT.  The situation where it can break is if you are not using
distutils to build things.  However, in that case, the symlinks added by 'make
install' should still make things just work.  However, if you don't use
distutils and still want to link against the correct shared library on a
multi-build system, you'll have to modify your rules to grab the build flags
from the Makefile (the easiest route being via sysconfig).

 Before trying to find a solution to your problem, I think it would be
 nice to get a consensus that this is really a desired feature.

Having multiple parallel altinstall installations be genuinely
non-interfering out of the box certainly seems like a desirable
feature to me.

Right.  Isn't that kind of the whole point of altinstall? :)  Well, multiple
coexisting versions of Python, but I think multiple coexisting builds of the
same Python version falls into the same category.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Barry Warsaw
On Oct 02, 2010, at 09:44 AM, Martin v. Löwis wrote:

 With my branch, you'll end up with this in /tmp/python:

 bin/python3.2m   - the normal build binary
 bin/python3.2dmu - the wide+pydebug build binary
 bin/python3.2m-config
 bin/python3.2dmu-config

 Do users really want to see such idiosyncratic suffixes?
 
 Ordinary users won't be building Python from source. Developers won't
 care so long as we clearly document the sundry suffixes and describe
 them in the README (or in a PEP, with a pointer from the README).

I think this is not true. Developers *will* care, and they will cry
foul very loudly, asking what nonsense this is. Antoine is proof of
that: he is a developer, and he understands the motivation well,
but it still goes against his notions of beauty (channeling him here).

Well, it may be surprising at first, but since it doesn't break any normal
usage I don't think most developers will care.  But I could be wrong.

 Having multiple parallel altinstall installations be genuinely
 non-interfering out of the box certainly seems like a desirable
 feature to me.

I think this should not use automatically generated suffixes, though.
Perhaps I want an altinstall that is in some kind restrict?
Or one where user peter has write access into site-packages?

I'm not sure how this relates to the suffix question...

I could accept that a suffix is parameter to configure (or some such),
and then gets used throughout. By default, Python will not add a
suffix. However, I still wonder why people couldn't just install
Python in a different prefix if they want separate installations.

For a distro, all those Python binaries have to go in /usr/bin.  We already
symlink /usr/bin/python to pythonX.Y so I don't see the harm in a few extra
symlinks.

However, if people *really* don't want to see this by default then I can think
of a few options:

* Enable the extra build-flag suffixes through a configure option and/or
  new Makefile target.  Could end up duplicating the altinstall rule if the
  current rule can't be refactored easily.

* Expose just the necessary low-level stuff to allow the distro installation
  scripts to move things around and create the symlinks after the fact.  This
  would mean that other distros (or from-source installers) wouldn't benefit
  from the isolation feature without some extra work on their part though.  It
  would be nice if this was a feature everybody could just have.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Barry Warsaw
On Oct 02, 2010, at 01:40 PM, Antoine Pitrou wrote:

Besides, mingling different installations together makes uninstalling
much more difficult.

Not for a distro I think.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Barry Warsaw
On Oct 01, 2010, at 07:36 PM, Benjamin Peterson wrote:

2010/10/1 Barry Warsaw ba...@python.org:
 I can think of a couple of ways out, none of which are totally
 satisfying. Probably the easiest out is to change the PEP 3149
 naming so that the files don't end in .so.  E.g. use this instead:

    foo.cpython-32dmu-so
    foo.cpython-32m-so

-1 Doesn't that break not only Python's convention for extensions on
shared modules but also any *nix shared object?

It shouldn't (i.e. if -Wl,-h is used to get the soname compiled in there), but
still I don't like it, and now I know it's not necessary.

 or similar.  I think that'd be a fairly straightforward change, but
 it might break some useful assumptions (we'd still fall back to .so
 of course).

 Other ideas:

 - make import.c smarter so that you can match against other than
 just the suffix.  probably a lot of work.

Although it would be more work, I think this is the most correct
option.

Phew! Extra work averted! (see my other reply :).

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Antoine Pitrou
On Mon, 4 Oct 2010 14:41:11 -0400
Barry Warsaw ba...@python.org wrote:
 
 For a distro, all those Python binaries have to go in /usr/bin.  We already
 symlink /usr/bin/python to pythonX.Y so I don't see the harm in a few extra
 symlinks.

Why would a distro want to provide all combinations of Python builds?

One important issue for me is guessability. While d is
reasonably guessable (and dbg or debug would be even better), u
and m are not.
(actually, u could lead to misunderstandings such as is this a
unicode-enabled version of Python?; as for m, I don't know what it's
for)

As for the SOABI, you could use a different mangling which would
preserve the .so suffix -- e.g. -debug.so instead of .so.d. At
least then well-known conventions would be preserved.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pronouncement needed in issue9675

2010-10-04 Thread Larry Hastings



Hi--sorry to be a little late to the discussion.  I'm the putz who 
backported capsules (and monkeyed with CObject) for 2.7.



On 09/27/2010 07:44 PM, Jesus Cea wrote:

http://bugs.python.org/issue9675

Long history sort: Python 2.7 backported Capsule support and
(incorrectly, in my opinion) marked CObject as deprecated.
   


Not strictly true.  CObject is marked with a Pending Deprecation 
warning.  But this is still turned into an error by -We.



All C modules in the stdlib were updated to Capsule (with a CObject
compatibility layer), except BSDDB, because this change was done late in
the cycle, the proposed patch was buggy (solvable) and a pronouncement
was done that CObject was not actually deprecated.
   


I did the stdlib conversion from CObjects to capsules.  I was told it'd 
be improper for me to convert bsddb to capsules because it's externally 
maintained (by you!).  You and I discussed converting bsddb to capsules 
for 2.7 at the very last minute but it didn't happen.



Since I think that adopting Capsule in BSDDB for 2.7.1 would break the
API compatibility (maybe the CObject proxy would solve this), and since
a previous pronouncement was done abour CObject not-deprecated in 2.7.x,
I would like comments.
   


By CObject proxy I assume you mean CObject support for opening 
capsules.  Specifically: PyCObject_AsVoidPtr()--and therefore functions 
that call it, such as PyCObject_Import()--understand capsules.  So yes, 
if the bsddb that shipped with Python 2.7.1 used a capsule, external 
users /should/ work unchanged.  I further theorize that external callers 
would continue to work fine even if they'd been compiled against 
previous versions of Python (though I admit I've never tried it).




On 09/28/2010 06:49 AM, Guido van Rossum wrote:
My guess is that the warning was added to 2.7 before it was clear that 
there would never be a 2.8.


I understood at the time how unlikely it is that there will ever be a 
2.8.  I'm just that paranoid/crazy.  Even if the Python core dev 
community never release a 2.7 (or 2.8) external luddites might, and I 
wanted my foot in the door.



I'm fine with changing the warning so it's only active with -3.  I still 
want it left as a PendingDeprecationWarning; PyErr_WarnPy3k() issues a 
DeprecationWarning, and it'd be inappropriate to upgrade it to that.  
(Though if there ever is a 2.8 (crosses self) maybe we could drop the -3 
requirement then?)  I'd be willing to contribute this change.  I'd be 
happy to convert every use of PendingDeprecationWarning, or just the one 
in cobject.c, either way.



Sorry about all the wailing and gnashing of teeth,


/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] API for binary operations on Sets

2010-10-04 Thread Larry Hastings


On 09/29/2010 08:50 PM, Raymond Hettinger wrote:
1. Liberalize setobject.c binary operator methods to accept anything 
registered to the Set ABC and add a backwards incompatible restriction 
to the Set ABC binary operator methods to only accept Set ABC 
instances (they currently accept any iterable).


This approach has a backwards incompatible tightening of the Set ABC, 
but that will probably affect very few people.  It also has the 
disadvantage of not providing a straight-forward way to handle general 
iterable arguments (either the implementer needs to write named binary 
methods like update, difference, etc for that purpose or the user will 
need to cast the the iterable to a set before operating on it).   The 
positive side of this option is that keeps the current advantages of 
the setobject API and its NotImplemented return value.


1a.  Liberalize setobject.c binary operator methods, restrict SetABC 
methods, and add named methods (like difference, update, etc) that 
accept any iterable.


I prefer 1 to 1a, but either is acceptable.  1 just forces you to call 
set() on your iterable before operating on it, which I think helps 
(explicit is better than implicit).


In 1a, by add named methods that accept any iterable, you mean add 
those to the SetABC?  If so, then I'm more strongly for 1 and against 
1a.  I'm not convinced all classes implementing the Set ABC should have 
to implement all those methods.



/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Barry Warsaw
On Oct 04, 2010, at 09:10 PM, Antoine Pitrou wrote:

On Mon, 4 Oct 2010 14:41:11 -0400
Barry Warsaw ba...@python.org wrote:
 
 For a distro, all those Python binaries have to go in /usr/bin.  We
 already symlink /usr/bin/python to pythonX.Y so I don't see the harm
 in a few extra symlinks.

Why would a distro want to provide all combinations of Python builds?

Maybe not all, but definitely several.  At least a normal build and a debug
build, but a wide unicode build possibly also.

One important issue for me is guessability. While d is
reasonably guessable (and dbg or debug would be even better), u
and m are not.
(actually, u could lead to misunderstandings such as is this a
unicode-enabled version of Python?; as for m, I don't know what it's
for)

I think symlinks will make this discoverable.  I like that the binary name's
suffix flags matches the flags used in PEP 3149, which also makes it easy to
document.  I could imagine python3-dbg would be symlinked to python3.2d (or
whatever).

As for the SOABI, you could use a different mangling which would
preserve the .so suffix -- e.g. -debug.so instead of .so.d. At
least then well-known conventions would be preserved.

We already have libpython3.2.so.1.0 which also doesn't end in .so.  I suppose
we could put the build flags before the .so. part, but I think Matthias had a
problem with that (I don't remember the details).

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 9807 - a glitch in coexisting builds of different types

2010-10-04 Thread Antoine Pitrou
On Mon, 4 Oct 2010 16:01:17 -0400
Barry Warsaw ba...@python.org wrote:
 
 Why would a distro want to provide all combinations of Python builds?
 
 Maybe not all, but definitely several.  At least a normal build and a debug
 build, but a wide unicode build possibly also.

What is the point of shipping a different unicode representation? Is
there any practical use case? I could understand a motivated user
trying different build flags for the purpose of experimentation and
personal enlightenment, but a Linux distribution?
(also, Debian's Python already defaults on wide unicode)

 As for the SOABI, you could use a different mangling which would
 preserve the .so suffix -- e.g. -debug.so instead of .so.d. At
 least then well-known conventions would be preserved.
 
 We already have libpython3.2.so.1.0 which also doesn't end in .so.

.so.number is a well-understood Unix convention, while
.so.some additional letters doesn't seem to be.
(this also means that tools such as file managers etc. may not display
the file type properly)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Stephen J. Turnbull
R. David Murray writes:
  On Mon, 04 Oct 2010 12:32:26 -0400, Scott Dial 
  scott+python-...@scottdial.com wrote:
   On 10/2/2010 7:00 PM, R. David Murray wrote:
The clever hack (thanks ultimately to Martin) is to accept 8bit data
by encoding it using the ASCII codec and the surrogateescape error
handler.
   
   I've seen this idea pop up in a number of threads. I worry that you are
   all inventing a new kind of dual that is a direct parallel to Python 2.x
   strings.
  
  Yes, that is exactly my worry.

I don't worry about this.  Strings generated by decoding with
surrogate-escape are *different* from other strings: they contain
invalid code units (the naked surrogates).  These cannot be encoded
except with a surrogate-escape flag to .encode(), and sane developers
won't do that unless she knows precisely what she's doing.  This is
not true with Python 2 strings, where all bytes are valid.

   Any reasonable 2.x code has to guard on str/unicode and it would seem in
   3.x, if this idiom spreads, reasonable code will have to guard on
   surrogate escapes (which actually seems like a more expensive test).
  
  Right, I mentioned that concern in my post.

Again, I don't worry about this.  It is *not* an *extra* cost.  Those
messages are *already broken*, they *will* crash the email module if
you fail to guard against them.  Decoding them to surrogates actually
makes it easier to guard, because you know that even if broken
encodings are present, the parser will still work.  Broken encodings
can no longer crash the parser.  That is a Very Good Thing IMHO.

  Only if the email package contains a coding error would the
  surrogates escape and cause problems for user code.

I don't think it is reasonable to internalize surrogates that way;
some applications *will* want to look at them and do something useful
with them (delete them or replace them with U+FFFD or ...).  However,
I argue below that the presence of surrogates already means the user
code is under fire, and this puts the problem in a canonical form so
the user code can prepare for it (if that is desirable).

   It seems like this hack is about making the 3.x unicode type more like
   the 2.x string type,

Not at all.  It's about letting the parser be a parser, and letting
the application handle broken content, or discard it, or whatever.
Modularity is improved.  This has been a major PITA for Mailman
support over the years: every time the spammers and virus writers come
up with a new idea, there's a chance it will leak out and the email
parser will explode, stopping the show.  These kinds of errors are a
FAQ on the Mailman lists (although much less so in recent years).

   How will developers not have to ask themselves whether a given
   string is a real string or a byte sequence masquerading as a
   string? Am I missing something here?

There are two things to say, actually.  First, you're in a war zone.
*All* email is bytes sequences masquerading as text, and if you're not
wearing armor, you're going to get burned.  The idea here is to have
the email package provide the armor and enough instrumentation so you
can do bomb detection yourself (or perhaps just let it blow, if you're
hacking up a quick and dirty script).

Second, there are developers who will not care whether strings are
real or byte sequences in drag, because they're writing MTAs and
the like.  Those people get really upset, and rightly so, when the
parser pukes on broken headers; it is not their app's job at all to
deal with that breakage.

  I think this question is something that needs to be considered any
  time using surrogates is proposed.

I don't agree.  The presence of naked surrogates is *always* (assuming
sane programmers) an indication of invalid input.  The question is,
should the parser signal invalidity, or should it allow the
application to decide?  The email module *doesn't have enough
information to decide* whether the invalid input is a real problem,
or how to handle it (cf the example of a MTA app).  Note that a
completely naive app doesn't care -- it will crash either way because
it doesn't handle the exception, whether it's raised by the parser or
by a codec when the app tries to do I/O.  A robust app *does* care: if
the parser raises, then the app must provide an alternative parser
good enough to find and fix the invalid bytes.  Clearly it's much
better to pass invalid (but fully parsed) text back to the app in this
case.

Note that if the app really wants the parser to raise rather than pass
on the input, that should be easy to implement at fairly low cost; you
just provide a variable rather than hardcoding the surrogate-escape
flag.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com