Re: [Python-Dev] bytes / unicode
Ian Bicking writes: Just for perspective, I don't know if I've ever wanted to deal with a URL like that. Ditto, I do many times a day for Japanese media sites and Wikipedia. I know how it is supposed to work, and I know what a browser does with that, but so many tools will clean that URL up *or* won't be able to deal with it at all that it's not something I'll be passing around. I'm not suggesting that is something you want to be passing around; it's a presentation form, and I prefer that the internal form use Unicode. While it's nice to be correct about encodings, sometimes it is impractical. And it is far nicer to avoid the situation entirely. But you cannot avoid it entirely. Processing bytes mean you are assuming ASCII compatibility. Granted, this is a pretty good assumption, especially if you got the bytes off the wire, but it's not universally so. Maybe it's a YAGNI, but one reason I prefer the decode-process-encode paradigm is that choice of codec is a specification of the assumptions you're making about encoding. So the Know-Nothing codec described above assumes just enough ASCII compatibility to parse the scheme. You could also have codecs which assume just enough ASCII compatibility to parse a hierarchical scheme, etc. That is, decoding content you don't care about isn't just inefficient, it's complicated and can introduce errors. That depends on the codec(s) used. Similarly I'd expect (from experience) that a programmer using Python to want to take the same approach, sticking with unencoded data in nearly all situations. Indeed, a programmer using Python 2 would want to do so, because all her literal strings are bytes by default (ie, if she doesn't mark them with `u'), and interactive input is, too. This is no longer so obvious in Python 3 which takes the attitude that things that are expected to be human-readable should be processed as str. The obvious example in URI space is the file:/// URL, which you'll typically build up from a user string or a file browser, which will call the os.path stuff which returns str. Text editors and viewers will also use str for their buffers, and if they provide a way to fish out URIs for their users, they'll probably return str. I won't pretend to judge the relative importance of such use cases. But use cases for urllib which naturally favor str until you put the URI on the wire do exist, as does the debugging presentation aspect. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] red buildbots on 2.7
On 22 Jun, 2010, at 19:05, Alexander Belopolsky wrote: On Tue, Jun 22, 2010 at 12:39 PM, Ronald Oussoren ronaldousso...@mac.com wrote: .. Both are valid fixes, both have both advantages and disadvantages. Your proposal: * Reverts to the behavior in 2.6 * Ensures that posix.getgroups and posix.setgroups are internally consistent It is also very simple and since posix module worked fine on OSX for years without _DARWIN_C_SOURCE, I think this is a very low risk change. I don't agree. The patch itself is pretty simple, but it does make a rather significant change to the build process: the compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule itself wouldn't.This may lead to subtle bugs, or even compile errors (because some function definitions change when _DARWIN_C_SOURCE active). And man compat(5) says: quote 32-BIT COMPILATION Defining _NONSTD_SOURCE causes library and kernel calls to behave as closely to Mac OS X 10.3's library and kernel calls as possible. Any behavioral changes in this mode are documented in the LEGACY sections of the individual function calls. Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform to the SUSv3 standards even if doing so would alter the behavior of functions used in 10.3. Defining _POSIX_C_SOURCE also removes functions, types, and other interfaces that are not part of SUSv3 from the normal C namespace, unless _DARWIN_C_SOURCE is also defined (i.e., _DARWIN_C_SOURCE is _POSIX_C_SOURCE with non-POSIX exten- sions). In any of these cases, the _DARWIN_FEATURE_UNIX_CONFORMANCE feature macro will be defined to the SUS conformance level (it is unde- fined otherwise). Starting in Mac OS X 10.5, if none of the macros _NONSTD_SOURCE, _POSIX_C_SOURCE or _DARWIN_C_SOURCE are defined, and the environment vari- able MACOSX_DEPLOYMENT_TARGET is either undefined or set to 10.5 or greater (or equivalently, the gcc(1) option -mmacosx-version-min is either not specified or set to 10.5 or greater), then UNIX conformance will be on by default, and non-POSIX extensions will also be available (this is the equivalent of defining _DARWIN_C_SOURCE). For version values less that 10.5, UNIX conformance will be off (the equivalent of defining _NONSTD_SOURCE). /quote My interpretation of that is that _DARWIN_C_SOURCE should be used to get SUSv3 APIs while keeping access to darwin-specific API's at well. When you deploy to 10.5 or later the compiler will set _DARWIN_C_SOURCE for you unless you set one of the other feature selecting defines. My proposal: * Uses the newer ABI, which is more likely to be the one Apple wants you to use I don't think so. In getgroups(2) I see LEGACY DESCRIPTION If _DARWIN_C_SOURCE is defined, getgroups() can return more than {NGROUPS_MAX} groups. This suggests that this is legacy behavior. Newer applications should use getgrouplist instead. I honestly don't know why this is in the LEGACY DESCRIPTION. But as the functionality you get with _DARWIN_C_SOURCE was added later I'd say that the behavior is intentional and not legacy. By not definining _DARWIN_C_SOURCE we don't necessarily get full UNIX behavior for other APIs. * Is compatible with system tools (that is, posix.getgroups() agrees with id(1)) I have not tested this recently, but I think if you exec id from a program after a call to setgroups(), it will return process groups, not user groups. * Is compatible with /usr/bin/python I am sure that one this issue is fixed upstream, Apple will pick it up with the next version. Haha. Apple explicitly added patches to get the current behavior instead of the default, what makes you think that they'll revert to the older behavior. * results in posix.getgroups not reflecting results of posix.setgroups This effectively substitutes getgrouplist called on the current user for getgroups. In 3.x, I believe the correct action will be to provide direct access to getgrouplist which is while not POSIX (yet?), is widely available. I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix (http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/libc.html), although this isn't a requirement for being added to the posix module. It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch is more complicated and the library function we use can be considered to be broken. Ronald smime.p7s Description: S/MIME cryptographic signature ___ Python-Dev mailing list Python-Dev@python.org
Re: [Python-Dev] bytes / unicode
James Y Knight writes: The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. This is the world we already have, modulo s/utf8/ascii + random GR charset/. It doesn't work, and it can't, in Japan or China or Korea, and probably not in Russia or Kazakhstan, for some time yet. That's not to say that byte-oriented processing doesn't have its place. And in many cases it's reasonable (but not secure or bulletproof!) to assume ASCII compatibility of the byte stream, passing through syntactically unimportant bytes verbatim. Syntactic analysis of such streams will surely have a lot in common with that for text streams, so the same tools should be available. (That's the point of Guido's endorsement of polymorphism, AIUI.) But it's just not reasonable to assume that will work in a context where text streams from various sources are mixed with byte streams. In that case, the byte streams need to be converted to text before mixing. (You can't do it the other way around because there is no guarantee that the text is compatible with the current encoding of the byte stream, nor that all the byte streams have the same encoding.) We do need str-based implementations of modules like urllib. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
Nick Coghlan wrote: On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg m...@egenix.com wrote: It would be great if we could have something like the above as builtin method: x.split(''.as(x)) As per my other message, another possible (and reasonably intuitive) spelling would be: x.split(x.coerce('')) You are right: there are two ways to adapt one object to another. You can either adapt object 1 to object 2 or object 2 to object 1. This is what the Python2 coercion protocol does for operators. I just wanted to avoid using that term, since Python3 removes the coercion protocol. Writing it as a helper function is also possible, although it be trickier to remember the correct argument ordering: def coerce_to(target, obj, encoding=None, errors='surrogateescape'): if isinstance(obj, type(target)): return obj if encoding is None: encoding = sys.getdefaultencoding() try:: convert = obj.decode except AttributeError: convert = obj.encode return convert(encoding, errors) x.split(coerce_to(x, '')) Perhaps something to discuss on the language summit at EuroPython. Too bad we can't add such porting enhancements to Python2 anymore. Well, we can if we really want to, it just entails convincing Benjamin to reschedule the 2.7 final release. Given the UserDict/ABC/old-style classes issue, there's a fair chance there's going to be at least one more 2.7 RC anyway. That said, since this kind of coercion can be done in a helper function, that should be adequate for the 2.x to 3.x conversion case (for 2.x, the helper function can be defined to just return the second argument since bytes and str are the same type, while the 3.x version would look something like the code above) True. Note that the point of using a builtin method was to get better performance. Such type adaptions are often needed in loops, so adding a few extra Python function calls just to convert a str object to a bytes object or vice-versa is a bit much overhead. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 23 2010) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2010-07-19: EuroPython 2010, Birmingham, UK25 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Wed, Jun 23, 2010 at 7:18 PM, M.-A. Lemburg m...@egenix.com wrote: Note that the point of using a builtin method was to get better performance. Such type adaptions are often needed in loops, so adding a few extra Python function calls just to convert a str object to a bytes object or vice-versa is a bit much overhead. I actually agree with that, I just think we need more real world experience as to what works with the Python 3 text model before we start messing with the APIs for the builtin objects (fair point that coerce is a loaded term given the existence of the old coercion protocol. It's the right word for the task though). One of the key points coming out of this thread (to my mind) is the lack of a Text ABC or other way of making an object that can be passed to functions expecting a str instance with a reasonable expectation of having it work. Are there some core string capabilities that can be identified and then expanded out to a full str-compatible API? (i.e. something along the lines of what collections.MutableMapping now provides for dict-alikes). However, even if something like that was added, PJE is correct in pointing out that builtin strings still don't play well with others in many cases (usually due to underlying optimisations or other sound reasons, but perhaps sometimes gratuitously). Most of the string binary operations can be dealt with through their reflected forms, but str.__mod__ will never return NotImplemented, __contains__ has no reflected form and the actual method calls are of course right out (e.g. the arguments to str.join() or str.split() calls have no ability to affect the type of the result). Third party number implementations couldn't provide comparable funtionality to builtin int and long objects until the __index__ protocol was added. Perhaps PJE is right that what this is really crying out for is a way to have third party real string implementations so that there can actually be genuine experimentation in the Unicode handling space outside the language core (comparable to the difference between the you can turn me into an int __int__ method and the I am an int equivalent __index__ method). That may be tapping in a nail with a sledgehammer (and would raise significant moratorium questions if pursued further), but I think it's a valid question to at least ask. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] WPython 1.1 was released
On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote: I've released WPython 1.1, which brings many optimizations and refactorings. For those of us who don't know what WPython is, and are too lazy, too busy, or reading their email off-line, could you give us a one short paragraph description of what it is? Actually, since I'm none of the above, I'll answer my own question: WPython is an implementation of Python that uses 16-bit wordcodes instead of byte code, and claims to have various performance benefits from doing so. It looks like good work, thank you. -- Steven D'Aprano ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] WPython 1.1 was released
2010/6/23 Steven D'Aprano st...@pearwood.info On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote: I've released WPython 1.1, which brings many optimizations and refactorings. For those of us who don't know what WPython is, and are too lazy, too busy, or reading their email off-line, could you give us a one short paragraph description of what it is? Actually, since I'm none of the above, I'll answer my own question: WPython is an implementation of Python that uses 16-bit wordcodes instead of byte code, and claims to have various performance benefits from doing so. It looks like good work, thank you. -- Steven D'Aprano Hi Steven, sorry, I made a mistake, assuming that the project was known. WPython is a CPython 2.6.4 implementation that uses wordcodes instead of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) used to represent VM opcodes. This new encoding enabled to simplify the execution of the virtual machine main cycle, improving understanding, maintenance, and extensibility; less space is required on average, and execution speed is improved too. Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Guido van Rossum wrote: On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote: Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will come optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ All I want for my birthday is another birthday - Ian Dury, 1942-2000 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] red buildbots on 2.7
On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com wrote: .. I don't agree. The patch itself is pretty simple, but it does make a rather significant change to the build process: the compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule itself wouldn't. This may lead to subtle bugs, or even compile errors (because some function definitions change when _DARWIN_C_SOURCE active). I agree. Messing with compatibility macros outside of pyconfig.h is not a good idea. Martin's hack, while likely to work in most cases, is still a hack. I believe, however we can undefine _DARWIN_C_SOURCE globally at least on 10.4 and higher. I grepped throught the headers on my 10.6 system and I notice that the majority of checks for _DARWIN_C_SOURCE are in the form of #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) According to a comment in configure, # On Mac OS X 10.4, defining _POSIX_C_SOURCE or _XOPEN_SOURCE # disables platform specific features beyond repair. # On Mac OS X 10.3, defining _POSIX_C_SOURCE or _XOPEN_SOURCE # has no effect, don't bother defining them _POSIX_C_SOURCE is already undefined in python headers, so undefining _DARWIN_C_SOURCE will have no effect on the majority of checks. I was able to find very few exceptions: some cases check _XOPEN_SOURCE instead or in addition to _POSIX_C_SOURCE before ignoring _DARWIN_C_SOURCE: /usr/include/grp.h:#if !defined(_XOPEN_SOURCE) || defined(_DARWIN_C_SOURCE) /usr/include/pwd.h:#if (!defined(_POSIX_C_SOURCE) !defined(_XOPEN_SOURCE)) || defined(_DARWIN_C_SOURCE) .. Since _XOPEN_SOURCE is similarly undefined in python headers, these cases are unaffected as well. This leaves a handful of cases where Apple provides additional macros for fine grained control: /usr/include/stdio.h:#if defined(__DARWIN_10_6_AND_LATER) (defined(_DARWIN_UNLIMITED_STREAMS) || defined(_DARWIN_C_SOURCE)) /usr/include/unistd.h:#if defined(_DARWIN_UNLIMITED_GETGROUPS) || defined(_DARWIN_C_SOURCE) The second line above is our dear friend and the _DARWIN_C_SOURCE behavior conditioned on the first line can be enabled by defining _DARWIN_UNLIMITED_STREAMS macro. I believe _DARWIN_C_SOURCE casts its net to wide and more targeted macros should be used instead. .. Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform to the SUSv3 standards even if doing so would alter the behavior of functions used in 10.3. I cannot reconcile this with !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) logic that I see in the headers. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 08:34 PM 6/22/2010 -0400, Glyph Lefkowitz wrote: I suspect the practical problem here is that there's no CharacterString ABC That, and the absence of a string coercion protocol so that mixing your custom string with standard strings will do the right thing for your intended use. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7
On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com wrote: .. * [Ronald's proposal] results in posix.getgroups not reflecting results of posix.setgroups This effectively substitutes getgrouplist called on the current user for getgroups. In 3.x, I believe the correct action will be to provide direct access to getgrouplist which is while not POSIX (yet?), is widely available. I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix (http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/libc.html), although this isn't a requirement for being added to the posix module. (The link you provided leads to Linux Standard Base Core Specification, which is different from POSIX, but the distinction is not relevant for our discussion.) It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch is more complicated and the library function we use can be considered to be broken. Let me try to formulate what the disagreement is. There are two different group lists that can be associated with a running process: 1) The list of current supplementary group IDs maintained by the system for each process and stored in per-process system tables; and 2) The list of the groups that include the uid under which the process is running as a member. The first list is returned by a system call getgroups and the second can be obtained using system database access functions as follows: pw = getpwuid(getuid()) getgrouplist(pw-pw_name, ..) The first list can be modified by privileged processes using setgroups system call, while the second changes when system databases change. The problem that _DARWIN_C_SOURCE introduces is that it replaces system getgroups with a database query effectively making the true process' list of supplementary group IDs inaccessible to programs. See source code at http://www.opensource.apple.com/source/Libc/Libc-594.1.4/sys/getgroups.c. The problem is complicated by the fact that OSX true getgroups call appears to truncate the list of groups to NGROUPS_MAX=16. Note, however that it is not clear whether the system call truncates the list or the underlying process tables are limited to 16 entries and additional groups are ignored when the process is created. In my view, getgroups and getgrouplist are two fundamentally different operations and both should be provided by the os module. Redefining os.getgroups to invoke getgrouplist instead of system getgroups on one particular platform to work around that platform's system call limitation is not right. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] red buildbots on 2.7
On 23 Jun, 2010,at 04:06 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote:On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com wrote: .. I don't agree. The patch itself is pretty simple, but it does make a rather significant change to the build process: the compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule itself wouldn't. This may lead to subtle bugs, or even compile errors (because some function definitions change when _DARWIN_C_SOURCE active). I agree. Messing with compatibility macros outside of pyconfig.h is not a good idea. Martin's hack, while likely to work in most cases, is still a hack. I believe, however we can undefine _DARWIN_C_SOURCE globally at least on 10.4 and higher. I grepped throught the headers on my 10.6 system and I notice that the majority of checks for _DARWIN_C_SOURCE are in the form ofAs I wrote the system will assume _DARWIN_C_SOURCE is set when when you don't set _POSIX_C_SOURCE or other feature macros. Working around that is a hack that I don't wish to support. .. Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform to the SUSv3 standards even if doing so would alter the behavior of functions used in 10.3. I cannot reconcile this with !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) logic that I see in the headersThis seems to be arranged in sys/cdefs.h. I honestly don't care how this done, the documentation clearly says that this happens and that indicates that _DARWIN_C_SOURCE selects the API Apple would like you to use.Anyway, why is this discusion on python-dev instead of in the issue tracker?BTW. IMHO resolution of this issue can wait until after 2.7.0, there is always 2.7.1 and I don't think we need to rush this (the issue has been dormant for quite a while)Ronald ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Stephen J. Turnbull wrote: We do need str-based implementations of modules like urllib. Why would that be? URLs aren't text, and never will be. The fact that to the eye they may seem to be text-ish doesn't make them text. This *is* a case where dont make me think is a losing propsition: programmers who work with URLs in any non-opaque way as text are eventually going to be bitten by this issue no matter how hard we wave our hands. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiKI4ACgkQ+gerLs4ltQ56/QCbBPdj8jaPbcvPIDPb7ys04oHg fLIAnR+kA2udazsnpzTp2INGz2CoWgzj =Swjw -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7
In my previous post, I forgot to include the link to the tracker issue where this problem is being worked on. http://bugs.python.org/issue7900 I'll repost my message there as an issue comment, so that a more detailed technical discussion can continue there. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Wed, Jun 23, 2010 at 8:30 AM, Tres Seaver tsea...@palladion.com wrote: Stephen J. Turnbull wrote: We do need str-based implementations of modules like urllib. Why would that be? URLs aren't text, and never will be. The fact that to the eye they may seem to be text-ish doesn't make them text. This *is* a case where dont make me think is a losing propsition: programmers who work with URLs in any non-opaque way as text are eventually going to be bitten by this issue no matter how hard we wave our hands. This has been asserted and contested several times now, and I don't see the two positions getting any closer. So I propose that we drop the discussion are URLs text or bytes and try to find something more pragmatic to discuss. For example: how we can make the suite of functions used for URL processing more polymorphic, so that each developer can choose for herself how URLs need to be treated in her application. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities
http://bugs.python.org/issue9061 On Tue, Jun 22, 2010 at 5:29 PM, Bill Janssen jans...@parc.com wrote: Craig Younkins cyounk...@gmail.com wrote: cgi.escape never escapes single quote characters, which can easily lead to a Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, but a quick search reveals many are using cgi.escape for HTML attribute escaping. Did you file a bug report? Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 23, 2010, at 08:43 AM, Guido van Rossum wrote: So I propose that we drop the discussion are URLs text or bytes and try to find something more pragmatic to discuss. email has exactly the same question, and the answer is yes. wink For example: how we can make the suite of functions used for URL processing more polymorphic, so that each developer can choose for herself how URLs need to be treated in her application. I think email package hackers should watch this effort closely. RDM has written some stuff up on how we think we're going to handle this, though it's probably pretty email package specific. Maybe there's a better, general, or conventional approach lurking around somewhere. http://wiki.python.org/moin/Email%20SIG -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
Tres Seaver tsea...@palladion.com wrote: Stephen J. Turnbull wrote: We do need str-based implementations of modules like urllib. Why would that be? URLs aren't text, and never will be. The fact that to the eye they may seem to be text-ish doesn't make them text. This URLs are exactly text (strings, representable as Unicode strings in Py3K), and were designed as such from the start. The fact that some of the things tunneled or carried in URLs are string representations of non-string data shouldn't obscure that point. They're not text-ish, they're text. They're not opaque, either; they break down in well-specified ways, mainly into strings. The trouble comes in when we try to go beyond the spec, or handle things that don't conform to the spec. Sure, a path component of a URI might actually be a %-escaped sequence of arbitrary bytes, even bytes that don't represent a string in any known encoding, but that's only *after* reversing the %-escapes, which should happen in a scheme-specific piece of code, not in generic URL parsing or manipulation. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Wed, Jun 23, 2010 at 10:30 AM, Tres Seaver tsea...@palladion.com wrote: Stephen J. Turnbull wrote: We do need str-based implementations of modules like urllib. Why would that be? URLs aren't text, and never will be. The fact that to the eye they may seem to be text-ish doesn't make them text. This *is* a case where dont make me think is a losing propsition: programmers who work with URLs in any non-opaque way as text are eventually going to be bitten by this issue no matter how hard we wave our hands. HTML is text, and URLs are embedded in that text, so it's easy to get a URL that is text. Though, with a little testing, I notice that text alone can't tell you what the right URL really is (at least the intended URL when unsafe characters are embedded in HTML). To test I created two pages, one in Latin-1 another in UTF-8, and put in the link: ./test.html?param=Réunion On a Latin-1 page it created a link to test.html?param=R%E9union and on a UTF-8 page it created a link to test.html?param=R%C3%A9union (the second link displays in the URL bar as test.html?param=Réunion but copies with percent encoding). Though if you link to ./Réunion.html then both pages create UTF-8 links. And both pages also link http://Réunion.comhttp://xn--runion-bva.comto http://xn--runion-bva.com/. So really neither bytes nor text works completely; query strings receive the encoding of the page, which would be handled transparently if you worked on the page's bytes. Path and domain are consistently encoded with UTF-8 and punycode respectively and so would be handled best when treated as text. And of course if you are a page with a non-ASCII-compatible encoding you really must handle encodings before the URL is sensible. Another issue here is that there's no encoding for turning a URL into bytes if the URL is not already ASCII. A proper way to encode a URL would be: (Totally as an aside, as I remind myself of new module names I notice it's not easy to google specifically for Python 3 docs, e.g. python 3 urlsplit gives me 2.6 docs) from urllib.parse import urlsplit, urlunsplit import encodings.idna def encode_http_url(url, page_encoding='ASCII', errors='strict'): scheme, netloc, path, query, fragment = urlsplit(url) scheme = scheme.encode('ASCII', errors) auth = port = None if '@' in netloc: auth, netloc = netloc.split('@', 1) if ':' in netloc: netloc, port = netloc.split(':', 1) netloc = encodings.idna.ToASCII(netloc) if port: netloc = netloc + b':' + port.encode('ASCII', errors) if auth: netloc = auth.encode('UTF-8', errors) + b'@' + netloc path = path.encode('UTF-8', errors) query = query.encode(page_encoding, errors) fragment = fragment.encode('UTF-8', errors) return urlunsplit_bytes((scheme, netloc, path, query, fragment)) Where urlunsplit_bytes handles bytes (urlunsplit does not). It's helpful for me at least to look at that code specifically: def urlunsplit(components): scheme, netloc, url, query, fragment = components if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'): if url and url[:1] != '/': url = '/' + url url = '//' + (netloc or '') + url if scheme: url = scheme + ':' + url if query: url = url + '?' + query if fragment: url = url + '#' + fragment return url In this case it really would be best to have Python 2's system where things are coerced to ASCII implicitly. Or, more specifically, if all those string literals in that routine could be implicitly converted to bytes using ASCII. Conceptually I think this is reasonable, as for URLs (at least with HTTP, but in practice I think this applies to all URLs) the ASCII bytes really do have meaning. That is, '/' (*in the context of urlunsplit*) really is \x2f specifically. Or another example, making a GET request really means sending the bytes \x47\x45\x54 and there is no other set of bytes that has that meaning. The WebSockets specification for instance defines things like colon: http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-76#page-5 -- in an earlier version they even used bytes to describe HTTP ( http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-54#page-13), though this annoyed many people. -- Ian Bicking | http://blog.ianbicking.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
Guido van Rossum gu...@python.org wrote: So I propose that we drop the discussion are URLs text or bytes and try to find something more pragmatic to discuss. For example: how we can make the suite of functions used for URL processing more polymorphic, so that each developer can choose for herself how URLs need to be treated in her application. While I agree with find something more pragmatic to discuss, it also seems to me that introducing polymorphic URL processing might make things more confusing and error-prone. The bigger problem seems to be that we're revisiting the design discussion about urllib.parse from the summer of 2008. See http://bugs.python.org/issue3300 if you want to recall how we hashed this out 2 years ago. I didn't particularly like that design, but I had to go off on vacation :-), and things got settled while I was away. I haven't heard much from Matt Giuca since he stopped by and lobbed that patch into the standard library. But since Guido is the one who settled it, why are we talking about it again? Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
Oops, I forgot some important quoting (important for the algorithm, maybe not actually for the discussion)... from urllib.parse import urlsplit, urlunsplit import encodings.idna # urllib.parse.quote both always returns str, and is not as conservative in quoting as required here... def quote_unsafe_bytes(b): result = [] for c in b: if c 0x20 or c = 0x80: result.extend(('%%%02X' % c).encode('ASCII')) else: result.append(c) return bytes(result) def encode_http_url(url, page_encoding='ASCII', errors='strict'): scheme, netloc, path, query, fragment = urlsplit(url) scheme = scheme.encode('ASCII', errors) auth = port = None if '@' in netloc: auth, netloc = netloc.split('@', 1) if ':' in netloc: netloc, port = netloc.split(':', 1) netloc = encodings.idna.ToASCII(netloc) if port: netloc = netloc + b':' + port.encode('ASCII', errors) if auth: netloc = quote_unsafe_bytes(auth.encode('UTF-8', errors)) + b'@' + netloc path = quote_unsafe_bytes(path.encode('UTF-8', errors)) query = quote_unsafe_bytes(query.encode(page_encoding, errors)) fragment = quote_unsafe_bytes(fragment.encode('UTF-8', errors)) return urlunsplit_bytes((scheme, netloc, path, query, fragment)) -- Ian Bicking | http://blog.ianbicking.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 22, 2010, at 8:57 PM, Robert Collins wrote: bzr has a cache of decoded strings in it precisely because decode is slow. We accept slowness encoding to the users locale because thats typically much less data to examine than we've examined while generating the commit/diff/whatever. We also face memory pressure on a regular basis, and that has been, at least partly, due to UCS4 - our translation cache helps there because we have less duplicate UCS4 strings. Thanks for setting the record straight - apologies if I missed this earlier in the thread. It does seem vaguely familiar. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] WPython 1.1 was released
On 6/23/2010 7:28 AM, Cesare Di Mauro wrote: sorry, I made a mistake, assuming that the project was known. A common mistake of people who announce their projects ;-) Someone recently make the same mistake on python-list with respect to a 'BDD' package (the Wikipedia suggests about 6 possible expansions of the acronym. WPython is a CPython 2.6.4 implementation that uses wordcodes instead of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) I suggest you specify the base version (2.6.4) on the project page as that would be very relevant to many who visit. One should not have to download and look at the source to discover to discover if they should bother downloading the code. Perhaps also add a sentence as to the choice (why not 3.1?). -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] WPython 1.1 was released
2010/6/23 Terry Reedy tjre...@udel.edu On 6/23/2010 7:28 AM, Cesare Di Mauro wrote: WPython is a CPython 2.6.4 implementation that uses wordcodes instead of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) I suggest you specify the base version (2.6.4) on the project page as that would be very relevant to many who visit. One should not have to download and look at the source to discover to discover if they should bother downloading the code. Perhaps also add a sentence as to the choice (why not 3.1?). -- Terry Jan Reedy Thanks for the suggestions. I've updated the main project accordingly. :) Cesare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bill Janssen wrote: The bigger problem seems to be that we're revisiting the design discussion about urllib.parse from the summer of 2008. See http://bugs.python.org/issue3300 if you want to recall how we hashed this out 2 years ago. I didn't particularly like that design, but I had to go off on vacation :-), and things got settled while I was away. I haven't heard much from Matt Giuca since he stopped by and lobbed that patch into the standard library. But since Guido is the one who settled it, why are we talking about it again? Perhaps such decisions need revisiting in light of subsequent experience / pain / learning. E.g: - - the repeated inability of the web-sig to converge on appropriate semantics for a Python3-compatible version of the WSGI spec; - - the subsequent quirkiness of the Python3 wsgiref implementation; - - the breakage in cgi.py which prevents handling file uploads in a web application; - - the slow adoption / porting rate of major web frameworks and libraries to Python 3. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiUSAACgkQ+gerLs4ltQ49EwCeLYwrZs6QfairPP5zpeeUlxao qg8An37kRz1CrzGc3kScvSqVx8FPnO1M =lR6R -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7
The problem that _DARWIN_C_SOURCE introduces is that it replaces system getgroups with a database query effectively making the true process' list of supplementary group IDs inaccessible to programs. See source code at http://www.opensource.apple.com/source/Libc/Libc-594.1.4/sys/getgroups.c. If that is true (i.e. the file is really the one that is being used), I think this is a severe flaw in OSX's implementation of the POSIX specification. Then, I agree that Python, in turn, should make sure that posix.getgroups is really the POSIX version of getgroups, not the Apple version. This is a general principle: if the system has two competing implementations of some API, the Python posix module should strive to call the POSIX version of the API. If the vendor's version of the API is also useful, it can be exposed under a different name (if, in turn, this is technically possible). Just my 0.02€. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 23, 2010, at 8:17 AM, Steve Holden wrote: Guido van Rossum wrote: On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote: Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will come optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. For sure. I don't speak for Tres, but I don't think he wasn't talking about optimism about *adoption*, overall, but optimism about adoption *rates*. And I don't think he was talking about it coming from Guido :). There has definitely been some irrational exuberance from some quarters. The form it usually takes is someone making a blog post which assumes, because the author could port their smallish library or application without too much hassle, that Python 2.x is already dead and everyone should be off of it in a couple of weeks. I've never heard this position from the core team or any official communication or documentation. Far from it: the realistic attitude that the Python 3 migration is something that will take a while has significantly reduced my own concerns. Even the aforementioned blog posts have been encouraging in some ways, because a lot of people are reporting surprisingly easy transitions. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Glyph Lefkowitz wrote: I don't speak for Tres, but I don't think he wasn't talking about optimism about *adoption*, overall, but optimism about adoption *rates*. And I don't think he was talking about it coming from Guido :). You channel me correctly here. In particular, the phrase build it and they will come was meant to address the idea that the only thing needed to drive adoption was the release of the new, shiny Python3. That particular bit of optimism is what I meant to describe as waning: the community on the whole seems to be more realistic now than two or three years ago about the kind of extra effort required from both core developers and from existing Python 2 folks to get to Python 3. There has definitely been some irrational exuberance from some quarters. The form it usually takes is someone making a blog post which assumes, because the author could port their smallish library or application without too much hassle, that Python 2.x is already dead and everyone should be off of it in a couple of weeks. I've never heard this position from the core team or any official communication or documentation. Far from it: the realistic attitude that the Python 3 migration is something that will take a while has significantly reduced my own concerns. Even the aforementioned blog posts have been encouraging in some ways, because a lot of people are reporting surprisingly easy transitions. Indeed. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiVS8ACgkQ+gerLs4ltQ4kQgCeJ9nwU8XyiWzOTpHSbWg21bzU 0/IAnjVOj5SlgA9mnAsx4/wMad5lNkqq =HObh -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Wed, 23 Jun 2010 14:23:33 -0400 Tres Seaver tsea...@palladion.com wrote: Perhaps such decisions need revisiting in light of subsequent experience / pain / learning. E.g: - - the repeated inability of the web-sig to converge on appropriate semantics for a Python3-compatible version of the WSGI spec; - - the subsequent quirkiness of the Python3 wsgiref implementation; The way wsgiref was adapted is admittedly suboptimal. It was totally broken at first, and PJE didn't want to look very deeply into it. We therefore had to settle on a series of small modifications that seemed rather reasonable, but without any in-depth discussion of what WSGI had to look like under Python 3 (since it was not our job and responsibility). Therefore, I don't think wsgiref should be taken as a guide to what a cleaned up, Python 3-specific WSGI must look like. - - the slow adoption / porting rate of major web frameworks and libraries to Python 3. Some of the major web frameworks and libraries have a ton of dependencies, which would explain why they really haven't bothered yet. I don't think you can't claim, though, that Python 3 makes things significantly harder for these frameworks. The proof is that many of them already give the user unicode strings in Python 2.x. They must have somehow got the decoding right. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7
On 23 Jun, 2010, at 16:48, Alexander Belopolsky wrote: On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren ronaldousso...@mac.com wrote: .. * [Ronald's proposal] results in posix.getgroups not reflecting results of posix.setgroups This effectively substitutes getgrouplist called on the current user for getgroups. In 3.x, I believe the correct action will be to provide direct access to getgrouplist which is while not POSIX (yet?), is widely available. I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix (http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/libc.html), although this isn't a requirement for being added to the posix module. (The link you provided leads to Linux Standard Base Core Specification, which is different from POSIX, but the distinction is not relevant for our discussion.) I know, but the page claims getgrouplist is in SUS. I've since looked at what claims to be a copy of SUS: http://www.unix.org/single_unix_specification/ and that does not contain getgrouplist. It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch is more complicated and the library function we use can be considered to be broken. Let me try to formulate what the disagreement is. There are two different group lists that can be associated with a running process: 1) The list of current supplementary group IDs maintained by the system for each process and stored in per-process system tables; and 2) The list of the groups that include the uid under which the process is running as a member. The first list is returned by a system call getgroups and the second can be obtained using system database access functions as follows: pw = getpwuid(getuid()) getgrouplist(pw-pw_name, ..) The first list can be modified by privileged processes using setgroups system call, while the second changes when system databases change. The problem that _DARWIN_C_SOURCE introduces is that it replaces system getgroups with a database query effectively making the true process' list of supplementary group IDs inaccessible to programs. See source code at http://www.opensource.apple.com/source/Libc/Libc-594.1.4/sys/getgroups.c. The problem is complicated by the fact that OSX true getgroups call appears to truncate the list of groups to NGROUPS_MAX=16. Note, however that it is not clear whether the system call truncates the list or the underlying process tables are limited to 16 entries and additional groups are ignored when the process is created. In my view, getgroups and getgrouplist are two fundamentally different operations and both should be provided by the os module. Redefining os.getgroups to invoke getgrouplist instead of system getgroups on one particular platform to work around that platform's system call limitation is not right. But we don't redefine os.getgroups to call getgrouplist, it is the system library that seems to implement getgroups(3) using getgrouplist(3). I agree that that is odd at best, but it is IMHO functioning as designed by Apple (that is, Apple choose the pick the current behavior, they didn't accidently break this). The previous paragraph is nitpicky, but this is IMO an important distinction. I've done some more experimentation: * compat(5) lies: not setting _DARWIN_C_SOURCE is not the same as settings _DARWIN_C_SOURCE when the deployment target is 10.5, with _DARWIN_C_SOURCE getgroups it translated to the symbol _getgroups$DARWIN_EXTSN in the object file, without it is _getgroups. * the id(1) command uses the version of getgroups that does not reflect setgroups. Given this script: import os os.system(id) os.setgroups([1]) os.system(id) Running it gives an unexpected output: # /usr/bin/python doit.py uid=0(root) gid=0(wheel) groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing) uid=0(root) gid=0(wheel) groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing) * when I add a group in the Accounts panel in System Preferences and add my account to it the id(1) command immediately reflects the change (as expected given the previous result) * adding a non-administrator account to a newly created group does not affect filesystem access for existing process (that is, if I created a file that's only readable for the new group and the test user couldn't read that file until I logged out and in again), which means the Account panel doesn't magically alter kernel state for running processes. * Setting or unsetting
Re: [Python-Dev] bytes / unicode
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote: On Wed, 23 Jun 2010 14:23:33 -0400 Tres Seaver tsea...@palladion.com wrote: - - the slow adoption / porting rate of major web frameworks and libraries to Python 3. Some of the major web frameworks and libraries have a ton of dependencies, which would explain why they really haven't bothered yet. I don't think you can't claim, though, that Python 3 makes things significantly harder for these frameworks. The proof is that many of them already give the user unicode strings in Python 2.x. They must have somehow got the decoding right. Note that this assumption seems optimistic to me. I started talking to Graham Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste do decoding of bytes to unicode at different layers which caused problems for application level code that should otherwise run fine when being served by mod_wsgi or paste httpserver. That was the beginning of Graham starting to talk about what the wsgi spec really should look like under python3 instead of the broken way that the appendix to the current wsgi spec states. -Toshio pgpRSbaUGJzcz.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Wed, 23 Jun 2010 17:30:22 -0400 Toshio Kuratomi a.bad...@gmail.com wrote: Note that this assumption seems optimistic to me. I started talking to Graham Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste do decoding of bytes to unicode at different layers which caused problems for application level code that should otherwise run fine when being served by mod_wsgi or paste httpserver. That was the beginning of Graham starting to talk about what the wsgi spec really should look like under python3 instead of the broken way that the appendix to the current wsgi spec states. Ok, but the reason would be that the WSGI spec is broken. Not Python 3 itself. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] swig/python and intel's threadedbuildginblocks
Hi, I've compiled intel's OSS threadedbuidlingblocks library on OpenBSD and put everything in some swig interfaces. Here you go: http://tullarisc.xtreemhost.com/swig.ttb.tgz Love, tullarisc. -- View this message in context: http://old.nabble.com/swig-python-and-intel%27s-threadedbuildginblocks-tp28975580p28975580.html Sent from the Python - python-dev mailing list archive at Nabble.com. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] bytes / unicode
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote: I don't think you can't claim, though, that Python 3 makes things significantly harder for these frameworks. The proof is that many of them already give the user unicode strings in Python 2.x. They must have somehow got the decoding right. Well... Frameworks usually 'simplify' the problem by partly ignoring it. By default they assume the data in the request in UTF-8. You can specify an alternative encoding in most of them. Django [1], Werkzeug [2], and WebOb [3] do that. The problem with this approach is that you still have to deal with weird requests where one thing is unicode, and another is latin-1. Sometime you can even have 2 different encodings in a single header like Cookies. There's no solution to this problem, it has to be solved on a case by case basis. There was a big discussion a while ago on web-sig. I think the consensus was that WSGI for Python 3 should assume that the data is encoded in latin-1 since it's the default encoding according to the RFC. [1] http://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.encoding [2] http://werkzeug.pocoo.org/documentation/dev/unicode.html#request-and-response-objects [3] http://pythonpaste.org/webob/reference.html#unicode-variables -- Henry Prêcheur ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] what environment variable should contain compiler warning suppression flags?
I finally realized why clang has not been silencing its warnings about unused return values: I have -Wno-unused-value set in CFLAGS which comes before OPT (which defines -Wall) as set in PY_CFLAGS in Makefile.pre.in. I could obviously set OPT in my environment, but that would override the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, but the README says that's for stuff that tweak binary compatibility. So basically what I am asking is what environment variable should I use? If CFLAGS is correct then does anyone have any issues if I change the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes after OPT? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Wed, Jun 23, 2010 at 11:35:12PM +0200, Antoine Pitrou wrote: On Wed, 23 Jun 2010 17:30:22 -0400 Toshio Kuratomi a.bad...@gmail.com wrote: Note that this assumption seems optimistic to me. I started talking to Graham Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste do decoding of bytes to unicode at different layers which caused problems for application level code that should otherwise run fine when being served by mod_wsgi or paste httpserver. That was the beginning of Graham starting to talk about what the wsgi spec really should look like under python3 instead of the broken way that the appendix to the current wsgi spec states. Ok, but the reason would be that the WSGI spec is broken. Not Python 3 itself. Agreed. Neither python2 nor python3 is broken. It's the wsgi spec and the implementation of that spec where things fall down. From your first post, I thought you were claiming that python3 was broken since web frameworks got decoding right on python2 and I just wanted to defend python3 by showing that python2 wasn't all sunshine and roses. -Toshio pgp8xQXfAPrYT.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities
On Jun 22, 2010, at 5:14 PM, Craig Younkins wrote: I suggest rewording the documentation for the method making it more clear what it should and should not be used for. I would like to see the method changed to properly escape single-quotes, but if it is not changed, the documentation should explicitly say this method does not make input safe for inclusion in HTML. Well, it *does* make the input safe for inclusion in HTML...in a double-quoted attribute. The docs could make it clearer that you should always use double- quotes around your attribute values when using it, though, I agree. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7
See also http://gimper.net/viewtopic.php?f=18t=3185. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com