Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Stephen J. Turnbull
Ian Bicking writes:

  Just for perspective, I don't know if I've ever wanted to deal with a URL
  like that.

Ditto, I do many times a day for Japanese media sites and Wikipedia.

  I know how it is supposed to work, and I know what a browser does
  with that, but so many tools will clean that URL up *or* won't be
  able to deal with it at all that it's not something I'll be passing
  around.

I'm not suggesting that is something you want to be passing around;
it's a presentation form, and I prefer that the internal form use
Unicode.

  While it's nice to be correct about encodings, sometimes it is
  impractical.  And it is far nicer to avoid the situation entirely.

But you cannot avoid it entirely.  Processing bytes mean you are
assuming ASCII compatibility.  Granted, this is a pretty good
assumption, especially if you got the bytes off the wire, but it's not
universally so.

Maybe it's a YAGNI, but one reason I prefer the decode-process-encode
paradigm is that choice of codec is a specification of the assumptions
you're making about encoding.  So the Know-Nothing codec described
above assumes just enough ASCII compatibility to parse the scheme.
You could also have codecs which assume just enough ASCII
compatibility to parse a hierarchical scheme, etc.

  That is, decoding content you don't care about isn't just
  inefficient, it's complicated and can introduce errors.

That depends on the codec(s) used.

  Similarly I'd expect (from experience) that a programmer using
  Python to want to take the same approach, sticking with unencoded
  data in nearly all situations.

Indeed, a programmer using Python 2 would want to do so, because all
her literal strings are bytes by default (ie, if she doesn't mark them
with `u'), and interactive input is, too.  This is no longer so
obvious in Python 3 which takes the attitude that things that are
expected to be human-readable should be processed as str.  The obvious
example in URI space is the file:/// URL, which you'll typically build
up from a user string or a file browser, which will call the os.path
stuff which returns str.

Text editors and viewers will also use str for their buffers, and if
they provide a way to fish out URIs for their users, they'll probably
return str.

I won't pretend to judge the relative importance of such use cases.
But use cases for urllib which naturally favor str until you put the
URI on the wire do exist, as does the debugging presentation aspect.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] red buildbots on 2.7

2010-06-23 Thread Ronald Oussoren

On 22 Jun, 2010, at 19:05, Alexander Belopolsky wrote:

 On Tue, Jun 22, 2010 at 12:39 PM, Ronald Oussoren
 ronaldousso...@mac.com wrote:
 ..
 Both are valid fixes, both have both advantages and disadvantages.
 
 Your proposal:
 * Reverts to the behavior in 2.6
 * Ensures that posix.getgroups and posix.setgroups are internally consistent
 
 It is also very simple and since posix module worked fine on OSX for
 years without _DARWIN_C_SOURCE, I think this is a very low risk
 change.

I don't agree.  The patch itself is pretty simple, but it does make a rather 
significant change to the build process: the compile-time environment in 
configure would be different than during the compilation of posixmodule. That 
is, in functions that check for features (the HAVE_FOOBAR macros in pyconfig.h) 
would use _DARWIN_C_SOURCE while posixmodule itself wouldn't.This may lead 
to subtle bugs, or even compile errors (because some function definitions 
change when _DARWIN_C_SOURCE active).

And man compat(5) says:

quote
32-BIT COMPILATION
 Defining _NONSTD_SOURCE causes library and kernel calls to behave as 
closely to Mac OS X 10.3's library and kernel calls as possible.  Any
 behavioral changes in this mode are documented in the LEGACY sections of 
the individual function calls.

 Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel 
calls to conform to the SUSv3 standards even if doing so would alter
 the behavior of functions used in 10.3.  Defining _POSIX_C_SOURCE also 
removes functions, types, and other interfaces that are not part of
 SUSv3 from the normal C namespace, unless _DARWIN_C_SOURCE is also defined 
(i.e., _DARWIN_C_SOURCE is _POSIX_C_SOURCE with non-POSIX exten-
 sions).  In any of these cases, the _DARWIN_FEATURE_UNIX_CONFORMANCE 
feature macro will be defined to the SUS conformance level (it is unde-
 fined otherwise).

 Starting in Mac OS X 10.5, if none of the macros _NONSTD_SOURCE, 
_POSIX_C_SOURCE or _DARWIN_C_SOURCE are defined, and the environment vari-
 able MACOSX_DEPLOYMENT_TARGET is either undefined or set to 10.5 or 
greater (or equivalently, the gcc(1) option -mmacosx-version-min is
 either not specified or set to 10.5 or greater), then UNIX conformance 
will be on by default, and non-POSIX extensions will also be available
 (this is the equivalent of defining _DARWIN_C_SOURCE).  For version values 
less that 10.5, UNIX conformance will be off (the equivalent of
 defining _NONSTD_SOURCE).
/quote

My interpretation of that is that _DARWIN_C_SOURCE should be used to get SUSv3 
APIs while keeping access to darwin-specific API's at well. When you deploy to 
10.5 or later the compiler will set _DARWIN_C_SOURCE for you unless you set one 
of the other feature selecting defines.

 
 My proposal:
 * Uses the newer ABI, which is more likely to be the one Apple wants you to 
 use
 
 I don't think so.  In getgroups(2) I see
 
 LEGACY DESCRIPTION
 If _DARWIN_C_SOURCE is defined, getgroups() can return more than
 {NGROUPS_MAX} groups.
 
 This suggests that this is legacy behavior.  Newer applications should
 use getgrouplist instead.

I honestly don't know why this is in the LEGACY DESCRIPTION. But as the 
functionality you get with _DARWIN_C_SOURCE was added later I'd say that the 
behavior is intentional and not legacy. By not definining _DARWIN_C_SOURCE 
we don't necessarily get full UNIX behavior for other APIs.

 
 * Is compatible with system tools (that is, posix.getgroups() agrees with 
 id(1))
 
 I have not tested this recently, but I think if you exec id from a
 program after a call to setgroups(), it will return process groups,
 not user groups.
 
 * Is compatible with /usr/bin/python
 
 I am sure that one this issue is fixed upstream, Apple will pick it up
 with the next version.

Haha.  Apple explicitly added patches to get the current behavior instead of 
the default, what makes you think that they'll revert to the older behavior.

 
 * results in posix.getgroups not reflecting results of posix.setgroups
 
 
 This effectively substitutes getgrouplist called on the current user
 for getgroups.  In 3.x, I believe the correct action will be to
 provide direct access to getgrouplist which is while not POSIX (yet?),
 is widely available.

I don't mind adding getgrouplist, but that issue is seperator from this one. 
BTW. Appearently getgrouplist is posix 
(http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/libc.html),
 although this isn't a requirement for being added to the posix module.


It is still my opinion that the second option is preferable for better 
compatibility with system tools, even if the patch is more complicated and the 
library function we use can be considered to be broken.

Ronald



smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
Python-Dev@python.org

Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Stephen J. Turnbull
James Y Knight writes:

  The surrogateescape method is a nice workaround for this, but I can't  
  help thinking that it might've been better to just treat stuff as  
  possibly-invalid-but-probably-utf8 byte-strings from input, through  
  processing, to output.

This is the world we already have, modulo s/utf8/ascii + random GR
charset/.  It doesn't work, and it can't, in Japan or China or Korea,
and probably not in Russia or Kazakhstan, for some time yet.

That's not to say that byte-oriented processing doesn't have its
place.  And in many cases it's reasonable (but not secure or
bulletproof!) to assume ASCII compatibility of the byte stream,
passing through syntactically unimportant bytes verbatim.  Syntactic
analysis of such streams will surely have a lot in common with that
for text streams, so the same tools should be available.  (That's the
point of Guido's endorsement of polymorphism, AIUI.)

But it's just not reasonable to assume that will work in a context
where text streams from various sources are mixed with byte streams.
In that case, the byte streams need to be converted to text before
mixing.  (You can't do it the other way around because there is no
guarantee that the text is compatible with the current encoding of the
byte stream, nor that all the byte streams have the same encoding.)

We do need str-based implementations of modules like urllib.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread M.-A. Lemburg
Nick Coghlan wrote:
 On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg m...@egenix.com wrote:
 It would be great if we could have something like the above as
 builtin method:

 x.split(''.as(x))
 
 As per my other message, another possible (and reasonably intuitive)
 spelling would be:
 
   x.split(x.coerce(''))

You are right: there are two ways to adapt one object to another.
You can either adapt object 1 to object 2 or object 2 to object 1.
This is what the Python2 coercion protocol does for operators.
I just wanted to avoid using that term, since Python3 removes
the coercion protocol.

 Writing it as a helper function is also possible, although it be
 trickier to remember the correct argument ordering:
 
   def coerce_to(target, obj, encoding=None, errors='surrogateescape'):
 if isinstance(obj, type(target)):
 return obj
 if encoding is None:
 encoding = sys.getdefaultencoding()
 try::
 convert = obj.decode
 except AttributeError:
 convert = obj.encode
 return convert(encoding, errors)
 
   x.split(coerce_to(x, ''))
 
 Perhaps something to discuss on the language summit at EuroPython.

 Too bad we can't add such porting enhancements to Python2 anymore.
 
 Well, we can if we really want to, it just entails convincing Benjamin
 to reschedule the 2.7 final release. Given the UserDict/ABC/old-style
 classes issue, there's a fair chance there's going to be at least one
 more 2.7 RC anyway.
 
 That said, since this kind of coercion can be done in a helper
 function, that should be adequate for the 2.x to 3.x conversion case
 (for 2.x, the helper function can be defined to just return the second
 argument since bytes and str are the same type, while the 3.x version
 would look something like the code above)

True.

Note that the point of using a builtin method was to get
better performance. Such type adaptions are often needed in
loops, so adding a few extra Python function calls just to
convert a str object to a bytes object or vice-versa is a
bit much overhead.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 23 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2010-07-19: EuroPython 2010, Birmingham, UK25 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Nick Coghlan
On Wed, Jun 23, 2010 at 7:18 PM, M.-A. Lemburg m...@egenix.com wrote:
 Note that the point of using a builtin method was to get
 better performance. Such type adaptions are often needed in
 loops, so adding a few extra Python function calls just to
 convert a str object to a bytes object or vice-versa is a
 bit much overhead.

I actually agree with that, I just think we need more real world
experience as to what works with the Python 3 text model before we
start messing with the APIs for the builtin objects (fair point that
coerce is a loaded term given the existence of the old coercion
protocol. It's the right word for the task though).

One of the key points coming out of this thread (to my mind) is the
lack of a Text ABC or other way of making an object that can be passed
to functions expecting a str instance with a reasonable expectation of
having it work. Are there some core string capabilities that can be
identified and then expanded out to a full str-compatible API? (i.e.
something along the lines of what collections.MutableMapping now
provides for dict-alikes).

However, even if something like that was added, PJE is correct in
pointing out that builtin strings still don't play well with others in
many cases (usually due to underlying optimisations or other sound
reasons, but perhaps sometimes gratuitously). Most of the string
binary operations can be dealt with through their reflected forms, but
str.__mod__ will never return NotImplemented, __contains__ has no
reflected form and the actual method calls are of course right out
(e.g. the arguments to str.join() or str.split() calls have no ability
to affect the type of the result).

Third party number implementations couldn't provide comparable
funtionality to builtin int and long objects until the __index__
protocol was added. Perhaps PJE is right that what this is really
crying out for is a way to have third party real string
implementations so that there can actually be genuine experimentation
in the Unicode handling space outside the language core (comparable to
the difference between the you can turn me into an int __int__
method and the I am an int equivalent __index__ method).

That may be tapping in a nail with a sledgehammer (and would raise
significant moratorium questions if pursued further), but I think it's
a valid question to at least ask.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] WPython 1.1 was released

2010-06-23 Thread Steven D'Aprano
On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote:
 I've released WPython 1.1, which brings many optimizations and
 refactorings.

For those of us who don't know what WPython is, and are too lazy, too 
busy, or reading their email off-line, could you give us a one short 
paragraph description of what it is?

Actually, since I'm none of the above, I'll answer my own question: 
WPython is an implementation of Python that uses 16-bit wordcodes 
instead of byte code, and claims to have various performance benefits 
from doing so.

It looks like good work, thank you.



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] WPython 1.1 was released

2010-06-23 Thread Cesare Di Mauro
2010/6/23 Steven D'Aprano st...@pearwood.info

 On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote:
  I've released WPython 1.1, which brings many optimizations and
  refactorings.

 For those of us who don't know what WPython is, and are too lazy, too
 busy, or reading their email off-line, could you give us a one short
 paragraph description of what it is?

 Actually, since I'm none of the above, I'll answer my own question:
 WPython is an implementation of Python that uses 16-bit wordcodes
 instead of byte code, and claims to have various performance benefits
 from doing so.

 It looks like good work, thank you.

 --
 Steven D'Aprano


Hi Steven,

sorry, I made a mistake, assuming that the project was known.

WPython is a CPython 2.6.4 implementation that uses wordcodes instead of
bytecodes. A wordcode is a word (16 bits, two bytes, in this case) used to
represent VM opcodes. This new encoding enabled to simplify the execution of
the virtual machine main cycle, improving understanding, maintenance, and
extensibility; less space is required on average, and execution speed is
improved too.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Steve Holden
Guido van Rossum wrote:
 On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote:
 Any turdiness (which I am *not* arguing for) is a natural consequence
 of the kinds of backward incompatibilities which were *not* ruled out
 for Python 3, along with the (early, now waning) build it and they will
  come optimism about adoption rates.
 
 FWIW, my optimisim is *not* waning. I think it's good that we're
 having this discussion and I expect something useful will come out of
 it; I also expect in general that the (admittedly serious) problem of
 having to port all dependencies will be solved in the next few years.
 Not by magic, but because many people are taking small steps in the
 right direction, and there will be light eventually. In the mean time
 I don't blame anyone for sticking with 2.x or being too busy to help
 port stuff to 3.x. Python 3 has been a long time in the making -- it
 will be a bit longer still, which was expected.
 
+1

The important thing is to avoid bigotry and FUD, and deal with things
the way they are. The #python IRC team have just helped us make a major
step forward. This won't be a campaign with a victorious charge over
some imaginary finish line.

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] red buildbots on 2.7

2010-06-23 Thread Alexander Belopolsky
On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com wrote:
..
 I don't agree.  The patch itself is pretty simple, but it does make a rather 
 significant change to the build process: the
 compile-time environment in configure would be different than during the 
 compilation of posixmodule. That is, in functions
 that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use 
 _DARWIN_C_SOURCE while posixmodule
 itself wouldn't.    This may lead to subtle bugs, or even compile errors 
 (because some function definitions change when
 _DARWIN_C_SOURCE active).

I agree.  Messing with compatibility macros outside of pyconfig.h is
not a good idea.  Martin's hack, while likely to work in most cases,
is still a hack.  I believe, however we can undefine _DARWIN_C_SOURCE
globally at least on 10.4 and higher.  I grepped throught the headers
on my 10.6 system and I notice that the majority of checks for
_DARWIN_C_SOURCE are in the form of

#if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)

According to a comment in configure,

  # On Mac OS X 10.4, defining _POSIX_C_SOURCE or _XOPEN_SOURCE
  # disables platform specific features beyond repair.
  # On Mac OS X 10.3, defining _POSIX_C_SOURCE or _XOPEN_SOURCE
  # has no effect, don't bother defining them

_POSIX_C_SOURCE is already undefined in python headers, so undefining
_DARWIN_C_SOURCE will have no effect on the majority of checks.

I was able to find very few exceptions:  some cases check
_XOPEN_SOURCE instead or in addition to _POSIX_C_SOURCE before
ignoring _DARWIN_C_SOURCE:

/usr/include/grp.h:#if !defined(_XOPEN_SOURCE) || defined(_DARWIN_C_SOURCE)
/usr/include/pwd.h:#if (!defined(_POSIX_C_SOURCE) 
!defined(_XOPEN_SOURCE)) || defined(_DARWIN_C_SOURCE)
..

Since _XOPEN_SOURCE is similarly undefined in python headers, these
cases are unaffected as well.

This leaves a handful of cases where Apple provides additional macros
for fine grained control:

/usr/include/stdio.h:#if defined(__DARWIN_10_6_AND_LATER) 
(defined(_DARWIN_UNLIMITED_STREAMS) || defined(_DARWIN_C_SOURCE))
/usr/include/unistd.h:#if defined(_DARWIN_UNLIMITED_GETGROUPS) ||
defined(_DARWIN_C_SOURCE)

The second line above is our dear friend and the _DARWIN_C_SOURCE
behavior conditioned on the first line can be enabled by defining
_DARWIN_UNLIMITED_STREAMS macro.

I believe _DARWIN_C_SOURCE casts its net to wide and more targeted
macros should be used instead.

..
     Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel 
 calls to conform
 to the SUSv3 standards even if doing so would alter  the behavior of 
 functions used in 10.3.

I cannot reconcile this with !defined(_POSIX_C_SOURCE) ||
defined(_DARWIN_C_SOURCE) logic that I see in the headers.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread P.J. Eby

At 08:34 PM 6/22/2010 -0400, Glyph Lefkowitz wrote:

I suspect the practical problem here is that there's no CharacterString ABC


That, and the absence of a string coercion protocol so that mixing 
your custom string with standard strings will do the right thing for 
your intended use.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7

2010-06-23 Thread Alexander Belopolsky
On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com wrote:
..

 * [Ronald's proposal] results in posix.getgroups not reflecting results of 
 posix.setgroups


 This effectively substitutes getgrouplist called on the current user
 for getgroups.  In 3.x, I believe the correct action will be to
 provide direct access to getgrouplist which is while not POSIX (yet?),
 is widely available.

 I don't mind adding getgrouplist, but that issue is seperator from this one. 
 BTW. Appearently getgrouplist is posix
 (http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/libc.html),
  although this isn't a
 requirement for being added to the posix module.


(The link you provided leads to Linux Standard Base Core
Specification, which is different from POSIX, but the distinction is
not relevant for our discussion.)


 It is still my opinion that the second option is preferable for better 
 compatibility with system tools, even if the patch
 is more complicated and the library function we use can be considered to be 
 broken.

Let me try to formulate what the disagreement is.  There are two
different group lists that can be associated with a running process:
1) The list of current supplementary group IDs maintained by the
system for each process and stored in per-process system tables; and
2) The list of the groups that include the uid under which the process
is running as a member.

The first list is returned by a system call getgroups and the second
can be obtained using system database access functions as follows:

pw = getpwuid(getuid())
getgrouplist(pw-pw_name, ..)

The first list can be modified by privileged processes using setgroups
system call, while the second changes when system databases change.

The problem that _DARWIN_C_SOURCE introduces is that it replaces
system getgroups with a database query effectively making the true
process' list of supplementary group IDs inaccessible to programs.
See source code at
http://www.opensource.apple.com/source/Libc/Libc-594.1.4/sys/getgroups.c.

The problem is complicated by the fact that OSX true getgroups call
appears to truncate the list of groups to NGROUPS_MAX=16.  Note,
however that it is not clear whether the system call truncates the
list or the underlying process tables are limited to 16 entries and
additional groups are ignored when the process is created.

In my view, getgroups and getgrouplist are two fundamentally different
operations and both should be provided by the os module.  Redefining
os.getgroups to invoke getgrouplist instead of system getgroups on one
particular platform to work around that platform's system call
limitation is not right.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] red buildbots on 2.7

2010-06-23 Thread ronaldoussoren
On 23 Jun, 2010,at 04:06 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote:On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com wrote:
..
 I don't agree. The patch itself is pretty simple, but it does make a rather significant change to the build process: the
 compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions
 that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule
 itself wouldn't.  This may lead to subtle bugs, or even compile errors (because some function definitions change when
 _DARWIN_C_SOURCE active).

I agree.  Messing with compatibility macros outside of pyconfig.h is
not a good idea.  Martin's hack, while likely to work in most cases,
is still a hack.  I believe, however we can undefine _DARWIN_C_SOURCE
globally at least on 10.4 and higher.  I grepped throught the headers
on my 10.6 system and I notice that the majority of checks for
_DARWIN_C_SOURCE are in the form ofAs I wrote the system will assume _DARWIN_C_SOURCE is set when when you don't set _POSIX_C_SOURCE or other feature macros.  Working around that is a hack that I don't wish to support.


..
   Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform
 to the SUSv3 standards even if doing so would alter the behavior of functions used in 10.3.

I cannot reconcile this with !defined(_POSIX_C_SOURCE) ||
defined(_DARWIN_C_SOURCE) logic that I see in the headersThis seems to be arranged in sys/cdefs.h.  I honestly don't care how this done, the documentation clearly says that this happens and that indicates that _DARWIN_C_SOURCE selects the API Apple would like you to use.Anyway, why is this discusion on python-dev instead of in the issue tracker?BTW. IMHO resolution of this issue can wait until after 2.7.0, there is always 2.7.1 and I don't think we need to rush this (the issue has been dormant for quite a while)Ronald

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Stephen J. Turnbull wrote:

 We do need str-based implementations of modules like urllib.

Why would that be?  URLs aren't text, and never will be.  The fact that
to the eye they may seem to be text-ish doesn't make them text.  This
*is* a case where dont make me think is a losing propsition:
programmers who work with URLs in any non-opaque way as text are
eventually going to be bitten by this issue no matter how hard we wave
our hands.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkwiKI4ACgkQ+gerLs4ltQ56/QCbBPdj8jaPbcvPIDPb7ys04oHg
fLIAnR+kA2udazsnpzTp2INGz2CoWgzj
=Swjw
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7

2010-06-23 Thread Alexander Belopolsky
In my previous post, I forgot to include the link to the tracker issue
where this problem is being worked on.

http://bugs.python.org/issue7900

I'll repost my message there as an issue comment, so that a more
detailed technical discussion can continue there.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Guido van Rossum
On Wed, Jun 23, 2010 at 8:30 AM, Tres Seaver tsea...@palladion.com wrote:
 Stephen J. Turnbull wrote:

 We do need str-based implementations of modules like urllib.

 Why would that be?  URLs aren't text, and never will be.  The fact that
 to the eye they may seem to be text-ish doesn't make them text.  This
 *is* a case where dont make me think is a losing propsition:
 programmers who work with URLs in any non-opaque way as text are
 eventually going to be bitten by this issue no matter how hard we wave
 our hands.

This has been asserted and contested several times now, and I don't
see the two positions getting any closer.

So I propose that we drop the discussion are URLs text or bytes and
try to find something more pragmatic to discuss.

For example: how we can make the suite of functions used for URL
processing more polymorphic, so that each developer can choose for
herself how URLs need to be treated in her application.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities

2010-06-23 Thread Craig Younkins
http://bugs.python.org/issue9061

On Tue, Jun 22, 2010 at 5:29 PM, Bill Janssen jans...@parc.com wrote:

 Craig Younkins cyounk...@gmail.com wrote:

  cgi.escape never escapes single quote characters, which can easily lead
 to a
  Cross-Site Scripting (XSS) vulnerability. This seems to be known by many,
  but a quick search reveals many are using cgi.escape for HTML attribute
  escaping.

 Did you file a bug report?

 Bill

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Barry Warsaw
On Jun 23, 2010, at 08:43 AM, Guido van Rossum wrote:

So I propose that we drop the discussion are URLs text or bytes and
try to find something more pragmatic to discuss.

email has exactly the same question, and the answer is yes. wink

For example: how we can make the suite of functions used for URL
processing more polymorphic, so that each developer can choose for
herself how URLs need to be treated in her application.

I think email package hackers should watch this effort closely.  RDM has
written some stuff up on how we think we're going to handle this, though it's
probably pretty email package specific.  Maybe there's a better, general, or
conventional approach lurking around somewhere.

http://wiki.python.org/moin/Email%20SIG

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Bill Janssen
Tres Seaver tsea...@palladion.com wrote:

 Stephen J. Turnbull wrote:
 
  We do need str-based implementations of modules like urllib.
 
 Why would that be?  URLs aren't text, and never will be.  The fact that
 to the eye they may seem to be text-ish doesn't make them text.  This

URLs are exactly text (strings, representable as Unicode strings in
Py3K), and were designed as such from the start.  The fact that some of
the things tunneled or carried in URLs are string representations of
non-string data shouldn't obscure that point.  They're not text-ish,
they're text.  They're not opaque, either; they break down in
well-specified ways, mainly into strings.

The trouble comes in when we try to go beyond the spec, or handle things
that don't conform to the spec.  Sure, a path component of a URI might
actually be a %-escaped sequence of arbitrary bytes, even bytes that
don't represent a string in any known encoding, but that's only *after*
reversing the %-escapes, which should happen in a scheme-specific piece
of code, not in generic URL parsing or manipulation.

Bill

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Ian Bicking
On Wed, Jun 23, 2010 at 10:30 AM, Tres Seaver tsea...@palladion.com wrote:

  Stephen J. Turnbull wrote:

  We do need str-based implementations of modules like urllib.


 Why would that be?  URLs aren't text, and never will be.  The fact that
 to the eye they may seem to be text-ish doesn't make them text.  This
 *is* a case where dont make me think is a losing propsition:
 programmers who work with URLs in any non-opaque way as text are
 eventually going to be bitten by this issue no matter how hard we wave
 our hands.


HTML is text, and URLs are embedded in that text, so it's easy to get a URL
that is text.  Though, with a little testing, I notice that text alone can't
tell you what the right URL really is (at least the intended URL when unsafe
characters are embedded in HTML).

To test I created two pages, one in Latin-1 another in UTF-8, and put in the
link:

  ./test.html?param=Réunion

On a Latin-1 page it created a link to test.html?param=R%E9union and on a
UTF-8 page it created a link to test.html?param=R%C3%A9union (the second
link displays in the URL bar as test.html?param=Réunion but copies with
percent encoding).  Though if you link to ./Réunion.html then both pages
create UTF-8 links.  And both pages also link
http://Réunion.comhttp://xn--runion-bva.comto
http://xn--runion-bva.com/.  So really neither bytes nor text works
completely; query strings receive the encoding of the page, which would be
handled transparently if you worked on the page's bytes.  Path and domain
are consistently encoded with UTF-8 and punycode respectively and so would
be handled best when treated as text.  And of course if you are a page with
a non-ASCII-compatible encoding you really must handle encodings before the
URL is sensible.

Another issue here is that there's no encoding for turning a URL into
bytes if the URL is not already ASCII.  A proper way to encode a URL would
be:

(Totally as an aside, as I remind myself of new module names I notice it's
not easy to google specifically for Python 3 docs, e.g. python 3 urlsplit
gives me 2.6 docs)

from urllib.parse import urlsplit, urlunsplit
import encodings.idna

def encode_http_url(url, page_encoding='ASCII', errors='strict'):
scheme, netloc, path, query, fragment = urlsplit(url)
scheme = scheme.encode('ASCII', errors)
auth = port = None
if '@' in netloc:
auth, netloc = netloc.split('@', 1)
if ':' in netloc:
netloc, port = netloc.split(':', 1)
netloc = encodings.idna.ToASCII(netloc)
if port:
netloc = netloc + b':' + port.encode('ASCII', errors)
if auth:
netloc = auth.encode('UTF-8', errors) + b'@' + netloc
path = path.encode('UTF-8', errors)
query = query.encode(page_encoding, errors)
fragment = fragment.encode('UTF-8', errors)
return urlunsplit_bytes((scheme, netloc, path, query, fragment))

Where urlunsplit_bytes handles bytes (urlunsplit does not).  It's helpful
for me at least to look at that code specifically:

def urlunsplit(components):
scheme, netloc, url, query, fragment = components
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
if url and url[:1] != '/': url = '/' + url
url = '//' + (netloc or '') + url
if scheme:
url = scheme + ':' + url
if query:
url = url + '?' + query
if fragment:
url = url + '#' + fragment
return url

In this case it really would be best to have Python 2's system where things
are coerced to ASCII implicitly.  Or, more specifically, if all those string
literals in that routine could be implicitly converted to bytes using
ASCII.  Conceptually I think this is reasonable, as for URLs (at least with
HTTP, but in practice I think this applies to all URLs) the ASCII bytes
really do have meaning.  That is, '/' (*in the context of urlunsplit*)
really is \x2f specifically.  Or another example, making a GET request
really means sending the bytes \x47\x45\x54 and there is no other set of
bytes that has that meaning.  The WebSockets specification for instance
defines things like colon:
http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-76#page-5 -- in
an earlier version they even used bytes to describe HTTP (
http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-54#page-13),
though this annoyed many people.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Bill Janssen
Guido van Rossum gu...@python.org wrote:

 So I propose that we drop the discussion are URLs text or bytes and
 try to find something more pragmatic to discuss.
 
 For example: how we can make the suite of functions used for URL
 processing more polymorphic, so that each developer can choose for
 herself how URLs need to be treated in her application.

While I agree with find something more pragmatic to discuss, it also
seems to me that introducing polymorphic URL processing might make
things more confusing and error-prone.

The bigger problem seems to be that we're revisiting the design
discussion about urllib.parse from the summer of 2008.  See
http://bugs.python.org/issue3300 if you want to recall how we hashed
this out 2 years ago.  I didn't particularly like that design, but I had
to go off on vacation :-), and things got settled while I was away.  I
haven't heard much from Matt Giuca since he stopped by and lobbed that
patch into the standard library.

But since Guido is the one who settled it, why are we talking about it
again?

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Ian Bicking
Oops, I forgot some important quoting (important for the algorithm,
maybe not actually for the discussion)...

from urllib.parse import urlsplit, urlunsplit
import encodings.idna

# urllib.parse.quote both always returns str, and is not as
conservative in quoting as required here...
def quote_unsafe_bytes(b):
result = []
for c in b:
if c  0x20 or c = 0x80:
result.extend(('%%%02X' % c).encode('ASCII'))
else:
result.append(c)
return bytes(result)

def encode_http_url(url, page_encoding='ASCII', errors='strict'):
    scheme, netloc, path, query, fragment = urlsplit(url)
    scheme = scheme.encode('ASCII', errors)
    auth = port = None
    if '@' in netloc:
    auth, netloc = netloc.split('@', 1)
    if ':' in netloc:
    netloc, port = netloc.split(':', 1)
    netloc = encodings.idna.ToASCII(netloc)
    if port:
    netloc = netloc + b':' + port.encode('ASCII', errors)
    if auth:
    netloc = quote_unsafe_bytes(auth.encode('UTF-8', errors)) +
b'@' + netloc
    path = quote_unsafe_bytes(path.encode('UTF-8', errors))
    query = quote_unsafe_bytes(query.encode(page_encoding, errors))
    fragment = quote_unsafe_bytes(fragment.encode('UTF-8', errors))
    return urlunsplit_bytes((scheme, netloc, path, query, fragment))



--
Ian Bicking  |  http://blog.ianbicking.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Glyph Lefkowitz

On Jun 22, 2010, at 8:57 PM, Robert Collins wrote:

 bzr has a cache of decoded strings in it precisely because decode is
 slow. We accept slowness encoding to the users locale because thats
 typically much less data to examine than we've examined while
 generating the commit/diff/whatever. We also face memory pressure on a
 regular basis, and that has been, at least partly, due to UCS4 - our
 translation cache helps there because we have less duplicate UCS4
 strings.

Thanks for setting the record straight - apologies if I missed this earlier in 
the thread.  It does seem vaguely familiar.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] WPython 1.1 was released

2010-06-23 Thread Terry Reedy

On 6/23/2010 7:28 AM, Cesare Di Mauro wrote:


sorry, I made a mistake, assuming that the project was known.


A common mistake of people who announce their projects ;-)
Someone recently make the same mistake on python-list with respect to a 
'BDD' package (the Wikipedia suggests about 6 possible expansions of the 
acronym.


WPython is a CPython 2.6.4 implementation that uses wordcodes instead
of bytecodes. A wordcode is a word (16 bits, two bytes, in this case)


I suggest you specify the base version (2.6.4) on the project page as 
that would be very relevant to many who visit. One should not have to 
download and look at the source to discover to discover if they should 
bother downloading the code. Perhaps also add a sentence as to the 
choice (why not 3.1?).



--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] WPython 1.1 was released

2010-06-23 Thread Cesare Di Mauro
2010/6/23 Terry Reedy tjre...@udel.edu

 On 6/23/2010 7:28 AM, Cesare Di Mauro wrote:
 WPython is a CPython 2.6.4 implementation that uses wordcodes instead
 of bytecodes. A wordcode is a word (16 bits, two bytes, in this case)

 I suggest you specify the base version (2.6.4) on the project page as that
 would be very relevant to many who visit. One should not have to download
 and look at the source to discover to discover if they should bother
 downloading the code. Perhaps also add a sentence as to the choice (why not
 3.1?).

 --
 Terry Jan Reedy


Thanks for the suggestions. I've updated the main project accordingly. :)

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Bill Janssen wrote:

 The bigger problem seems to be that we're revisiting the design
 discussion about urllib.parse from the summer of 2008.  See
 http://bugs.python.org/issue3300 if you want to recall how we hashed
 this out 2 years ago.  I didn't particularly like that design, but I had
 to go off on vacation :-), and things got settled while I was away.  I
 haven't heard much from Matt Giuca since he stopped by and lobbed that
 patch into the standard library.
 
 But since Guido is the one who settled it, why are we talking about it
 again?

Perhaps such decisions need revisiting in light of subsequent experience
/ pain / learning.  E.g:

- - the repeated inability of the web-sig to converge on appropriate
  semantics for a Python3-compatible version of the WSGI spec;

- - the subsequent quirkiness of the Python3 wsgiref implementation;

- - the breakage in cgi.py which prevents handling file uploads in a
  web application;

- - the slow adoption / porting rate of major web frameworks and libraries
  to Python 3.



Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkwiUSAACgkQ+gerLs4ltQ49EwCeLYwrZs6QfairPP5zpeeUlxao
qg8An37kRz1CrzGc3kScvSqVx8FPnO1M
=lR6R
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7

2010-06-23 Thread Martin v. Löwis

The problem that _DARWIN_C_SOURCE introduces is that it replaces
system getgroups with a database query effectively making the true
process' list of supplementary group IDs inaccessible to programs.
See source code at
http://www.opensource.apple.com/source/Libc/Libc-594.1.4/sys/getgroups.c.


If that is true (i.e. the file is really the one that is being used),
I think this is a severe flaw in OSX's implementation of the POSIX 
specification.


Then, I agree that Python, in turn, should make sure that 
posix.getgroups is really the POSIX version of getgroups, not the Apple 
version. This is a general principle: if the system has two competing 
implementations of some API, the Python posix module should strive to 
call the POSIX version of the API. If the vendor's version of the API is 
also useful, it can be exposed under a different name (if, in turn, this 
is technically possible).


Just my 0.02€.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Glyph Lefkowitz

On Jun 23, 2010, at 8:17 AM, Steve Holden wrote:

 Guido van Rossum wrote:
 On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote:
 Any turdiness (which I am *not* arguing for) is a natural consequence
 of the kinds of backward incompatibilities which were *not* ruled out
 for Python 3, along with the (early, now waning) build it and they will
 come optimism about adoption rates.
 
 FWIW, my optimisim is *not* waning. I think it's good that we're
 having this discussion and I expect something useful will come out of
 it; I also expect in general that the (admittedly serious) problem of
 having to port all dependencies will be solved in the next few years.
 Not by magic, but because many people are taking small steps in the
 right direction, and there will be light eventually. In the mean time
 I don't blame anyone for sticking with 2.x or being too busy to help
 port stuff to 3.x. Python 3 has been a long time in the making -- it
 will be a bit longer still, which was expected.
 
 +1
 
 The important thing is to avoid bigotry and FUD, and deal with things
 the way they are. The #python IRC team have just helped us make a major
 step forward. This won't be a campaign with a victorious charge over
 some imaginary finish line.

For sure.

I don't speak for Tres, but I don't think he wasn't talking about optimism 
about *adoption*, overall, but optimism about adoption *rates*.  And I don't 
think he was talking about it coming from Guido :).

There has definitely been some irrational exuberance from some quarters.  The 
form it usually takes is someone making a blog post which assumes, because the 
author could port their smallish library or application without too much 
hassle, that Python 2.x is already dead and everyone should be off of it in a 
couple of weeks.

I've never heard this position from the core team or any official communication 
or documentation.  Far from it: the realistic attitude that the Python 3 
migration is something that will take a while has significantly reduced my own 
concerns.

Even the aforementioned blog posts have been encouraging in some ways, because 
a lot of people are reporting surprisingly easy transitions.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Glyph Lefkowitz wrote:

 I don't speak for Tres, but I don't think he wasn't talking about
 optimism about *adoption*, overall, but optimism about adoption
 *rates*.  And I don't think he was talking about it coming from Guido
 :).

You channel me correctly here.  In particular, the phrase build it and
they will come was meant to address the idea that the only thing needed
to drive adoption was the release of the new, shiny Python3.  That
particular bit of optimism is what I meant to describe as waning:  the
community on the whole seems to be more realistic now than two or three
years ago about the kind of extra effort required from both core
developers and from existing Python 2 folks to get to Python 3.

 There has definitely been some irrational exuberance from some
 quarters.  The form it usually takes is someone making a blog post
 which assumes, because the author could port their smallish library
 or application without too much hassle, that Python 2.x is already
 dead and everyone should be off of it in a couple of weeks.
 
 I've never heard this position from the core team or any official
 communication or documentation.  Far from it: the realistic attitude
 that the Python 3 migration is something that will take a while has
 significantly reduced my own concerns.
 
 Even the aforementioned blog posts have been encouraging in some
 ways, because a lot of people are reporting surprisingly easy
 transitions.

Indeed.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkwiVS8ACgkQ+gerLs4ltQ4kQgCeJ9nwU8XyiWzOTpHSbWg21bzU
0/IAnjVOj5SlgA9mnAsx4/wMad5lNkqq
=HObh
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Antoine Pitrou
On Wed, 23 Jun 2010 14:23:33 -0400
Tres Seaver tsea...@palladion.com wrote:
 
 Perhaps such decisions need revisiting in light of subsequent experience
 / pain / learning.  E.g:
 
 - - the repeated inability of the web-sig to converge on appropriate
   semantics for a Python3-compatible version of the WSGI spec;
 
 - - the subsequent quirkiness of the Python3 wsgiref implementation;

The way wsgiref was adapted is admittedly suboptimal. It was totally
broken at first, and PJE didn't want to look very deeply into it. We
therefore had to settle on a series of small modifications that seemed
rather reasonable, but without any in-depth discussion of what WSGI had
to look like under Python 3 (since it was not our job and responsibility).

Therefore, I don't think wsgiref should be taken as a guide to what
a cleaned up, Python 3-specific WSGI must look like.

 - - the slow adoption / porting rate of major web frameworks and libraries
   to Python 3.

Some of the major web frameworks and libraries have a ton of
dependencies, which would explain why they really haven't bothered yet.

I don't think you can't claim, though, that Python 3 makes things
significantly harder for these frameworks. The proof is that many of
them already give the user unicode strings in Python 2.x. They must
have somehow got the decoding right.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7

2010-06-23 Thread Ronald Oussoren

On 23 Jun, 2010, at 16:48, Alexander Belopolsky wrote:

 On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com 
 wrote:
 ..
 
 * [Ronald's proposal] results in posix.getgroups not reflecting results of 
 posix.setgroups
 
 
 This effectively substitutes getgrouplist called on the current user
 for getgroups.  In 3.x, I believe the correct action will be to
 provide direct access to getgrouplist which is while not POSIX (yet?),
 is widely available.
 
 I don't mind adding getgrouplist, but that issue is seperator from this one. 
 BTW. Appearently getgrouplist is posix
 (http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/libc.html),
  although this isn't a
 requirement for being added to the posix module.
 
 
 (The link you provided leads to Linux Standard Base Core
 Specification, which is different from POSIX, but the distinction is
 not relevant for our discussion.)

I know, but the page claims getgrouplist is in SUS.  I've since looked at what 
claims to be a copy of SUS: http://www.unix.org/single_unix_specification/ and 
that does not contain getgrouplist. 

 
 
 It is still my opinion that the second option is preferable for better 
 compatibility with system tools, even if the patch
 is more complicated and the library function we use can be considered to be 
 broken.
 
 Let me try to formulate what the disagreement is.  There are two
 different group lists that can be associated with a running process:
 1) The list of current supplementary group IDs maintained by the
 system for each process and stored in per-process system tables; and
 2) The list of the groups that include the uid under which the process
 is running as a member.
 
 The first list is returned by a system call getgroups and the second
 can be obtained using system database access functions as follows:
 
 pw = getpwuid(getuid())
 getgrouplist(pw-pw_name, ..)
 
 The first list can be modified by privileged processes using setgroups
 system call, while the second changes when system databases change.
 
 The problem that _DARWIN_C_SOURCE introduces is that it replaces
 system getgroups with a database query effectively making the true
 process' list of supplementary group IDs inaccessible to programs.
 See source code at
 http://www.opensource.apple.com/source/Libc/Libc-594.1.4/sys/getgroups.c.
 
 The problem is complicated by the fact that OSX true getgroups call
 appears to truncate the list of groups to NGROUPS_MAX=16.  Note,
 however that it is not clear whether the system call truncates the
 list or the underlying process tables are limited to 16 entries and
 additional groups are ignored when the process is created.
 
 In my view, getgroups and getgrouplist are two fundamentally different
 operations and both should be provided by the os module.  Redefining
 os.getgroups to invoke getgrouplist instead of system getgroups on one
 particular platform to work around that platform's system call
 limitation is not right.

But we don't redefine os.getgroups to call getgrouplist, it is the system 
library that
seems to implement getgroups(3) using getgrouplist(3).  I agree that that is 
odd at best,
but it is IMHO functioning as designed by Apple (that is, Apple choose the pick
the current behavior, they didn't accidently break this).

The previous paragraph is nitpicky, but this is IMO an important distinction.


I've done some more experimentation:

*  compat(5) lies: not setting _DARWIN_C_SOURCE is not the same as settings 
_DARWIN_C_SOURCE when the deployment target is 10.5, with _DARWIN_C_SOURCE 
getgroups it translated to the symbol _getgroups$DARWIN_EXTSN in the object 
file, without it is _getgroups.

* the id(1) command uses the version of getgroups that does not reflect 
setgroups. Given this script:
import os

os.system(id)
os.setgroups([1])
os.system(id)

Running it gives an unexpected output:

# /usr/bin/python doit.py
uid=0(root) gid=0(wheel) 
groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing)
uid=0(root) gid=0(wheel) 
groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing)

* when I add a group in the Accounts panel in System Preferences and add my 
account to it the id(1) command immediately reflects the change (as expected 
given the previous result)

* adding a non-administrator account to a newly created group does not affect 
filesystem access for existing process (that is, if I created a file that's 
only readable for the new group and the test user couldn't read that file until 
I logged out and in again), which means the Account panel doesn't magically 
alter kernel state for running processes.

* Setting or unsetting 

Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Toshio Kuratomi
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote:
 On Wed, 23 Jun 2010 14:23:33 -0400
 Tres Seaver tsea...@palladion.com wrote:
  - - the slow adoption / porting rate of major web frameworks and libraries
to Python 3.
 
 Some of the major web frameworks and libraries have a ton of
 dependencies, which would explain why they really haven't bothered yet.
 
 I don't think you can't claim, though, that Python 3 makes things
 significantly harder for these frameworks. The proof is that many of
 them already give the user unicode strings in Python 2.x. They must
 have somehow got the decoding right.
 
Note that this assumption seems optimistic to me.  I started talking to Graham
Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste
do decoding of bytes to unicode at different layers which caused problems
for application level code that should otherwise run fine when being served
by mod_wsgi or paste httpserver.  That was the beginning of Graham starting
to talk about what the wsgi spec really should look like under python3
instead of the broken way that the appendix to the current wsgi spec states.

-Toshio


pgpRSbaUGJzcz.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Antoine Pitrou
On Wed, 23 Jun 2010 17:30:22 -0400
Toshio Kuratomi a.bad...@gmail.com wrote:
 Note that this assumption seems optimistic to me.  I started talking to Graham
 Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste
 do decoding of bytes to unicode at different layers which caused problems
 for application level code that should otherwise run fine when being served
 by mod_wsgi or paste httpserver.  That was the beginning of Graham starting
 to talk about what the wsgi spec really should look like under python3
 instead of the broken way that the appendix to the current wsgi spec states.

Ok, but the reason would be that the WSGI spec is broken. Not Python 3
itself.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] swig/python and intel's threadedbuildginblocks

2010-06-23 Thread tullarisc

Hi,

I've compiled intel's OSS threadedbuidlingblocks library on OpenBSD
and put everything in some swig interfaces.

Here you go: http://tullarisc.xtreemhost.com/swig.ttb.tgz

Love, tullarisc.
-- 
View this message in context: 
http://old.nabble.com/swig-python-and-intel%27s-threadedbuildginblocks-tp28975580p28975580.html
Sent from the Python - python-dev mailing list archive at Nabble.com.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] bytes / unicode

2010-06-23 Thread Henry Precheur
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote:
 I don't think you can't claim, though, that Python 3 makes things
 significantly harder for these frameworks. The proof is that many of
 them already give the user unicode strings in Python 2.x. They must
 have somehow got the decoding right.

Well... Frameworks usually 'simplify' the problem by partly ignoring it.
By default they assume the data in the request in UTF-8. You can specify
an alternative encoding in most of them. Django [1], Werkzeug [2], and
WebOb [3] do that.

The problem with this approach is that you still have to deal with weird
requests where one thing is unicode, and another is latin-1. Sometime
you can even have 2 different encodings in a single header like Cookies.
There's no solution to this problem, it has to be solved on a case by
case basis.

There was a big discussion a while ago on web-sig. I think the consensus
was that WSGI for Python 3 should assume that the data is encoded in
latin-1 since it's the default encoding according to the RFC.


[1] 
http://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.encoding
[2] 
http://werkzeug.pocoo.org/documentation/dev/unicode.html#request-and-response-objects
[3] http://pythonpaste.org/webob/reference.html#unicode-variables

-- 
  Henry Prêcheur
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-23 Thread Brett Cannon
I finally realized why clang has not been silencing its warnings about
unused return values: I have -Wno-unused-value set in CFLAGS which
comes before OPT (which defines -Wall) as set in PY_CFLAGS in
Makefile.pre.in.

I could obviously set OPT in my environment, but that would override
the default OPT settings Python uses. I could put it in EXTRA_CFLAGS,
but the README says that's for stuff that tweak binary compatibility.

So basically what I am asking is what environment variable should I
use? If CFLAGS is correct then does anyone have any issues if I change
the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes
after OPT?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Toshio Kuratomi
On Wed, Jun 23, 2010 at 11:35:12PM +0200, Antoine Pitrou wrote:
 On Wed, 23 Jun 2010 17:30:22 -0400
 Toshio Kuratomi a.bad...@gmail.com wrote:
  Note that this assumption seems optimistic to me.  I started talking to 
  Graham
  Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste
  do decoding of bytes to unicode at different layers which caused problems
  for application level code that should otherwise run fine when being served
  by mod_wsgi or paste httpserver.  That was the beginning of Graham starting
  to talk about what the wsgi spec really should look like under python3
  instead of the broken way that the appendix to the current wsgi spec states.
 
 Ok, but the reason would be that the WSGI spec is broken. Not Python 3
 itself.
 
Agreed.  Neither python2 nor python3 is broken.  It's the wsgi spec and the
implementation of that spec where things fall down.  From your first post,
I thought you were claiming that python3 was broken since web frameworks got
decoding right on python2 and I just wanted to defend python3 by showing
that python2 wasn't all sunshine and roses.

-Toshio


pgp8xQXfAPrYT.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities

2010-06-23 Thread James Y Knight


On Jun 22, 2010, at 5:14 PM, Craig Younkins wrote:

I suggest rewording the documentation for the method making it more  
clear what it should and should not be used for. I would like to see  
the method changed to properly escape single-quotes, but if it is  
not changed, the documentation should explicitly say this method  
does not make input safe for inclusion in HTML.


Well, it *does* make the input safe for inclusion in HTML...in a  
double-quoted attribute.


The docs could make it clearer that you should always use double- 
quotes around your attribute values when using it, though, I agree.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7

2010-06-23 Thread Bill Janssen
See also http://gimper.net/viewtopic.php?f=18t=3185.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com