Re: [Python-Dev] Dropping __init__.py requirement for subpackages
Guido van Rossum wrote: > On 4/26/06, Barry Warsaw <[EMAIL PROTECTED]> wrote: > >>On Wed, 2006-04-26 at 10:16 -0700, Guido van Rossum wrote: >> >> >>>So I have a very simple proposal: keep the __init__.py requirement for >>>top-level pacakages, but drop it for subpackages. This should be a >>>small change. I'm hesitant to propose *anything* new for Python 2.5, >>>so I'm proposing it for 2.6; if Neal and Anthony think this would be >>>okay to add to 2.5, they can do so. [...] >>>>I'd be -1 but the remote possibility of you being burned at the stake by >>your fellow Googlers makes me -0 :). > > > I'm not sure I understand what your worry is. I happen to be a Googler too, but I was a Pythonista first... I'm -1 for minor mainly subjective reasons; 1) explicit is better than implicit. I prefer to be explicit about what is and isn't a module. I have plenty of "doc" and "test" and other directories inside python module source tree's that I don't want to be python modules. 2) It feels more consistant to always require it. /foo/ is a python package because it contains an __init__.py... so package /foo/bar/ should have one one too. 3) It changes things for what feels like very little gain. I've never had problems with it, and don't find the import exception hard to diagnose. Note that I think the vast majority of "newbie missing __init__.py" problems within google occur because people are missing __init__.py at the root of package import tree. This change would not not solve that problem. It wouldn't surprise me if this change would introduce a slew of newbies complaining that "I have /foo on my PYTHONPATH, why can't I import foo/bar/" because they're forgotten the (now) rarely required __init__.py -- Donovan Baarda ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Default Locale, was; Re: strftime/strptime locale funnies...
On Wed, 2006-04-05 at 12:13 -0700, Brett Cannon wrote: > On 4/5/06, Donovan Baarda <[EMAIL PROTECTED]> wrote: > > G'day, > > > > Just noticed on Debian (testing), Ubuntu (warty?), and RedHat (old) > > based systems Python's time.strptime() seems to ignore the environment's > > Locale and just uses "C". [...] > Beats me. This could be a locale thing. If I remember correctly > Python assumes the C locale on some things. I suspect the reason for > this is in the locale module or libc. But you can't even find the > word 'locale' or 'Locale' in timemodule.c nor do I know of any calls > that mess with the locale, so I doubt 'time' is at fault for this. OK, I've found and confirmed what it is with a quick C program. The default Locale for lib C is 'C'. It is up the program to set its locale to match the environment using; setlocale(LC_ALL,""); The Python locale module documents this, and recommends putting; import locale locale.setlocale(locale.LC_ALL, '') At the top of programs to make them use your locale as specified in your environment. Note that locale.resetlocale() is documented as "resets the locale to the default settings", where the default is determined by locale.getdefaultlocale(), which uses the environment. So the "default" is determined from your environment, but "C" is used by default... nice and confusing :-) Should Python do setlocale(LC_ALL,"") on startup so that the "default" locale is used by default? -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] strftime/strptime locale funnies...
G'day, Just noticed on Debian (testing), Ubuntu (warty?), and RedHat (old) based systems Python's time.strptime() seems to ignore the environment's Locale and just uses "C". Last time I looked at this, time.strptime() leveraged off the platform's strptime(), which meant it had all the extra features, bugs and missingness of the platform's implementation. We now seem to be using a Python implementation in _strptime.py. This implementation does Locale's by feeding a magic date to time.strftime() and figuring out how it formats it. This revealed that time.strftime() is not honouring the Locale settings, which is causing the new Python strptime() to also get it wrong. $ set | grep "^LC\|LANG" GDM_LANG=en_AU.UTF-8 LANG=en_AU.UTF-8 LANGUAGE=en_AU.UTF-8 LC_COLLATE=C $ date -d "1999-02-22" +%x 22/02/99 $ python ... >>> import time >>> time.strftime("%x", time.strptime("1999-02-22","%Y-%m-%d")) '02/22/99' This is consistent across all three platforms for multiple Python versions, including 2.1 and 1.5 (where they were available) which BTW don't use the Python implementation of strptime(). This suggests that all three of these platforms have a broken libc strftime() implementation... but all three? And why does date work? Can others reproduce this? Have I done something stupid? Is this a bug, and in what, libc or Python? Slightly OT, is it wise to use a Python strptime() on platforms that have a perfectly good one in libc? The Python reverse-engineering of libc's strftime() output to figure out locale formatting is clever, but... I see there have already been bugs submitted about strftime/strptime non-symmetry for things like support of extensions. There has also been a bug against strptime() Locale switching not working because of caching Locale formatting info from the strftime() analysis, but I can't seem to get non-C Locale's working at all... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Threading idea -- exposing a global thread lock
On Tue, 2006-03-14 at 00:36 -0500, Raymond Hettinger wrote: > [Guido] > > Oh, no! > > Before shooting this one down, consider a simpler incarnation not involving > the > GIL. The idea is to allow an active thread to temporarily suspend switching > for > a few steps: [...] > I disagree that the need is rare. My own use case is that I sometimes add > some > debugging print statements that need to execute atomically -- it is a PITA > because PRINT_ITEM and PRINT_NEWLINE are two different opcodes and are not > guaranteed to pair atomically. The current RightWay(tm) is for me to create > a > separate daemon thread for printing and to send lines to it via the queue > module > (even that is tricky because you don't want the main thread to exit before a > print queued item is completed). I suggest that that is too complex for a > simple debugging print statement. It would be great to simply write: You don't need to use queue... that has the potentially nasty side affect of allowing threads to run ahead before their debugging has been output. A better way is to have all your debugging go through a print_debug() method that acquires and releases a debug_lock threading.Lock. This is simpler as it avoids the separate thread, and ensures that threads "pause" until their debugging output is done. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Threading idea -- exposing a global thread lock
On Mon, 2006-03-13 at 21:06 -0800, Guido van Rossum wrote: > Oh, no! Please! > > I just had to dissuade someone inside Google from the same idea. Heh... that was me... I LOL'ed when I saw this... and no, I didn't put Raymond up to it :-) > IMO it's fatally flawed for several reasons: it doesn't translate > reasonably to Jython or IronPython, it's really tricky to implement, > and it's an invitation for deadlocks. The danger of this thing in the > wrong hands is too big to warrant the (rare) use case that can only be > solved elegantly using direct GIL access. I didn't bother pursuing it because I'm not that attached to it... I'm not sure that a language like Python really needs it, and I don't do that kind of programming much any more. When I did, I was programming in Ada. The Ada language has a global thread-lock used as a primitive to implement all other atomic operations and thread-synchronisation stuff... (it's been a while... this may have been a particular Ada compiler extension, though I think the Ada concurrency model pretty much required it). And before that it was in assembler; an atomic section was done by disabling all interrupts. At that low-level, atomic sections were the building-block for all the other higher level synchronisation tools. I believe the original semaphore relied on an atomic test-and-set operation. The main place where something like this would be useful in Python is in writing thread-safe code that uses non-thread safe resources. Examples are; a chunk of code that redirects then restores sys.stdout, something that changes then restores TZ using time.settz(), etc. I think the deadlock risk argument is bogus... any locking has deadlock risks. The "danger in the wrong hands" I'm also unconvinced about; non-threadsafe resource use worries me far more than a strong lock. I'd rather debug a deadlock than a race condition any day. But the hard to implement for other VMs is a breaker, and suggests there a damn good reasons those VMs disallow it that I haven't thought of :-) So I'm +0, probably -0.5... > --Guido > > On 3/13/06, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > > A user on comp.lang.python has twisted himself into knots writing > > multi-threaded > > code that avoids locks and queues but fails when running code with > > non-atomic > > access to a shared resource. While his specific design is somewhat flawed, > > it > > does suggest that we could offer an easy way to make a block of code atomic > > without the complexity of other synchronization tools: > > > >gil.acquire() > >try: > > #do some transaction that needs to be atomic > >finally: > > gil.release() > > > > The idea is to temporarily suspend thread switches (either using the GIL or > > a > > global variable in the eval-loop). Think of it as "non-cooperative" > > multi-threading. While this is a somewhat rough approach, it is dramatically > > simpler than the alternatives (i.e. wrapping locks around every access to a > > resource or feeding all resource requests to a separate thread via a Queue). > > > > While I haven't tried it yet, I think the implementation is likely to be > > trivial. > > > > FWIW, the new with-statement makes the above fragment even more readable: > > > > with atomic_transaction(): > > # do a series of steps without interruption > > > > > > Raymond > > > > ___ > > Python-Dev mailing list > > Python-Dev@python.org > > http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/abo%40minkirri.apana.org.au -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes.from_hex()
On Tue, 2006-02-28 at 15:23 -0800, Bill Janssen wrote: > Greg Ewing wrote: > > Bill Janssen wrote: > > > > > bytes -> base64 -> text > > > text -> de-base64 -> bytes > > > > It's nice to hear I'm not out of step with > > the entire world on this. :-) > > Well, I can certainly understand the bytes->base64->bytes side of > thing too. The "text" produced is specified as using "a 65-character > subset of US-ASCII", so that's really bytes. Huh... just joining here but surely you don't mean a text string that doesn't use every character available in a particular encoding is "really bytes"... it's still a text string... If you base64 encode some bytes, you get a string. If you then want to access that base64 string as if it was a bunch of bytes, cast it to bytes. Be careful not to confuse "(type)cast" with "(type)convert"... A "convert" transforms the data from one type/class to another, modifying it to be a valid equivalent instance of the other type/class; ie int -> float. A "cast" does not modify the data in any way, it just changes its type/class to be the other type, and assumes that the data is a valid instance of the other type; eg int32 -> bytes[4]. Minor data munging under the hood to cleanly switch the type/class is acceptable (ie adding array length info etc) provided you keep to the spirit of the "cast". Keep these two concepts separate and you should be right :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] calendar.timegm
On Tue, 2006-02-21 at 22:47 -0600, [EMAIL PROTECTED] wrote: > Sergey> Historical question ;) > > Sergey> Anyone can explain why function timegm is placed into module > Sergey> calendar, not to module time, where it would be near with > Sergey> similar function mktime? > > Historical accident. ;-) It seems time contains a simple wrapper around the equivalent C functions. There is no C equivalent to timegm() (how do they do it?). The timegm() function is implemented in python using the datetime module. The name sux BTW. It would be nice if there was a time.mkgmtime(), but it would need to be implemented in C. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] threadsafe patch for asynchat
On Thu, 2006-02-09 at 13:12 +0100, Fredrik Lundh wrote: > Donovan Baarda wrote: > > >> Here I think you meant that medusa didn't handle computation in separate > >> threads instead. > > > > No, I pretty much meant what I said :-) > > > > Medusa didn't have any concept of a deferred, hence the idea of using > > one to collect the results of a long computation in another thread never > > occurred to them... remember the highly refactored OO beauty that is > > twisted was not even a twinkle in anyone's eye yet. > > > > In theory it would be just as easy to add twisted style deferToThread to > > Medusa, and IMHO it is a much better approach. Unfortunately at the time > > they went the other way and implemented multiple async-loops in separate > > threads. > > that doesn't mean that everyone using Medusa has done things in the wrong > way, of course ;-) Of course... and even Zope2 was not necessarily the "wrong way"... it was a perfectly valid design decision, given that it was all new ground at the time. And it works really well... there were many consequences of that design that probably contributed to the robustness of other Zope components like ZODB... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] threadsafe patch for asynchat
On Wed, 2006-02-08 at 15:14 +0100, Valentino Volonghi aka Dialtone wrote: > On Wed, Feb 08, 2006 at 01:23:26PM +0000, Donovan Baarda wrote: > > I believe that Twisted does pretty much this with it's "deferred" stuff. > > It shoves slow stuff off for processing in a separate thread that > > re-syncs with the event loop when it's finished. > > Deferreds are only an elaborate way to deal with a bunch of callbacks. > It's Twisted itself that provides a way to run something in a separate thread > and then fire a deferred (from the main thread) when the child thread > finishes (reactor.callInThread() to call stuff in a different thread, [...] I know they are more than just a way to run slow stuff in threads, but once you have them, simple as they are, they present an obvious solution to all sorts of things, including long computations in a thread. Note that once zope2 took the approach it did, blocking the async-loop didn't hurt so bad, so lots of zope add-ons just did it gratuitously. In many cases the slow event handlers were slow because they are waiting on IO that could in theory be serviced as yet another event handler in the async-loop. However, the Zope/Medusa async framework had become so scary hardly anyone knew how to do this without breaking Zope itself. > > In the case of Zope/ZEO I'm not entirely sure but I think what happened > > was medusa (asyncore/asynchat based stuff Zope2 was based on) didn't > > have this deferred handler support. When they found some of the stuff > > Here I think you meant that medusa didn't handle computation in separate > threads instead. No, I pretty much meant what I said :-) Medusa didn't have any concept of a deferred, hence the idea of using one to collect the results of a long computation in another thread never occurred to them... remember the highly refactored OO beauty that is twisted was not even a twinkle in anyone's eye yet. In theory it would be just as easy to add twisted style deferToThread to Medusa, and IMHO it is a much better approach. Unfortunately at the time they went the other way and implemented multiple async-loops in separate threads. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] threadsafe patch for asynchat
On Wed, 2006-02-08 at 02:33 -0500, Steve Holden wrote: > Martin v. Löwis wrote: > > Tim Peters wrote: [...] > > What is the reason that people want to use threads when they can have > > poll/select-style message processing? Why does Zope require threads? > > IOW, why would anybody *want* a "threadsafe patch for asynchat"? > > > In case the processing of events needed to block? If I'm processing web > requests in an async* dispatch loop and a request needs the results of a > (probably lengthy) database query in order to generate its output, how > do I give the dispatcher control again to process the next asynchronous > network event? > > The usual answer is "process the request in a thread". That way the > dispatcher can spring to life for each event as quickly as needed. I believe that Twisted does pretty much this with it's "deferred" stuff. It shoves slow stuff off for processing in a separate thread that re-syncs with the event loop when it's finished. In the case of Zope/ZEO I'm not entirely sure but I think what happened was medusa (asyncore/asynchat based stuff Zope2 was based on) didn't have this deferred handler support. When they found some of the stuff Zope was doing took a long time, they came up with an initially simpler but IMHO uglier solution of running multiple async loops in separate threads and using a front-end dispatcher to distribute connections to them. This way it wasn't too bad if an async loop stalled, because the other loops in other threads could continue to process stuff. If ZEO is still using this approach I think switching to a twisted style approach would be a good idea. However, I suspect this would be a very painful refactor... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] syntactic support for sets
On Mon, 2006-02-06 at 15:36 +0100, Ronald Oussoren wrote: > On Monday, February 06, 2006, at 03:12PM, Donovan Baarda <[EMAIL PROTECTED]> > wrote: > > >On Fri, 2006-02-03 at 20:02 +0100, "Martin v. Löwis" wrote: > >> Donovan Baarda wrote: > >> > Before set() the standard way to do them was to use dicts with None > >> > Values... to me the "{1,2,3}" syntax would have been a logical extension > >> > of the "a set is a dict with no values, only keys" mindset. I don't know > >> > why it wasn't done this way in the first place, though I missed the > >> > arguments where it was rejected. > >> > >> There might be many reasons; one obvious reason is that you can't spell > >> the empty set that way. > > > >Hmm... how about "{,}", which is the same trick tuples use for the empty > >tuple? > > Isn't () the empty tuple? I guess you're confusing this with a single element > tuple: (1,) instead of (1) (well actually it is "1,") Yeah, sorry.. nasty brainfart... > BTW. I don't like your proposal for spelling the empty set as {,} because > that is entirely non-obvious. If {1,2,3} where a valid way to spell a set > literal, I'd expect {} for the empty set. yeah... the problem is differentiating the empty set from an empty dict. The only alternative that occured to me was the not-so-nice and not-backwards-compatible "{:}" for an empty dict and "{}" for an empty set. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] syntactic support for sets
On Fri, 2006-02-03 at 20:02 +0100, "Martin v. Löwis" wrote: > Donovan Baarda wrote: > > Before set() the standard way to do them was to use dicts with None > > Values... to me the "{1,2,3}" syntax would have been a logical extension > > of the "a set is a dict with no values, only keys" mindset. I don't know > > why it wasn't done this way in the first place, though I missed the > > arguments where it was rejected. > > There might be many reasons; one obvious reason is that you can't spell > the empty set that way. Hmm... how about "{,}", which is the same trick tuples use for the empty tuple? -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] syntactic support for sets
On Fri, 2006-02-03 at 11:56 -0800, Josiah Carlson wrote: > Donovan Baarda <[EMAIL PROTECTED]> wrote: [...] > > Nuff was a fairy... though I guess it depends on where you draw the > > line; should [1,2,3] be list(1,2,3)? > > Who is "Nuff"? fairynuff... :-) > Along the lines of "not every x line function should be a builtin", "not > every builtin should have syntax". I think that sets have particular > uses, but I don't believe those uses are sufficiently varied enough to > warrant the creation of a syntax. I suggest that people take a walk > through their code. How often do you use other sequence and/or mapping > types? How many lists, tuples and dicts are there? How many sets? Ok, > now how many set literals? The absence of sets in early Python, the requirement to "import sets" when they first appeared, and the lack of a set syntax now all mean that people tend to avoid using sets and resort to lists, tuples, and "dicts of None" instead, even though they really want a set. Anywhere you see "if value in sequence:", they probably mean sequence is a set, and this code would run much faster if it really was, and might even avoid potential bugs because it would prevent duplicates... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] syntactic support for sets
On Fri, 2006-02-03 at 09:00 -0800, Josiah Carlson wrote: [...] > Sets are tacked on. That's why you need to use 'import sets' to get to > them, in a similar fashion that you need to use 'import array' to get > access to C-like arrays. No you don't; $ python Python 2.4.1 (#2, Mar 30 2005, 21:51:10) [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> v=set((1,2,3)) >>> f=frozenset(v) >>> set and frozenset are now builtin. > I personally object to making syntax for sets for the same reasons I > object to making arrays, heapqs, Queues, deques, or any of the other > data structure-defining modules in the standard library into syntax. Nuff was a fairy... though I guess it depends on where you draw the line; should [1,2,3] be list(1,2,3)? -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] syntactic support for sets
On Fri, 2006-02-03 at 12:04 +, Donovan Baarda wrote: > On Wed, 2006-02-01 at 13:55 -0500, Greg Wilson wrote: [...] > Personally I'd like this. currently the "set(...)" syntax makes sets > feel tacked on compared to tuples, lists, dicts, and strings which have > nice built in syntax support. Many people don't realise they are there > because of this. [...] > Frozensets are to sets what tuples are to lists. It would be nice if > there was another type of bracket that could be used for frozenset... > something like ':1,2,3:'... yuk... I dunno. One possible bracket option for frozenset would be "<1,2,3>" which I initially rejected because of the possible syntactic clash with the < and > operators... however, there may be a way this could work... dunno. The other thing that keeps nagging me is set, frozenset, tuple, and list all overlap in functionality to fairly significant degrees. Sometimes it feels like just implementation or application differences... could a list that is never modified be optimised under the hood as a tuple? Could the immutability constraint of tuples be just acquired by a list when it is used as a key? Could a set simply be a list with unique values? etc. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] syntactic support for sets
On Wed, 2006-02-01 at 13:55 -0500, Greg Wilson wrote: > Hi, > > I have a student who may be interested in adding syntactic support for > sets to Python, so that: > > x = {1, 2, 3, 4, 5} > > and: > > y = {z for z in x if (z % 2)} Personally I'd like this. currently the "set(...)" syntax makes sets feel tacked on compared to tuples, lists, dicts, and strings which have nice built in syntax support. Many people don't realise they are there because of this. Before set() the standard way to do them was to use dicts with None Values... to me the "{1,2,3}" syntax would have been a logical extension of the "a set is a dict with no values, only keys" mindset. I don't know why it wasn't done this way in the first place, though I missed the arguments where it was rejected. As for frozenset vs set, I would be inclined to make them normal mutable sets. This is in line with the "dict without values" idea. Frozensets are to sets what tuples are to lists. It would be nice if there was another type of bracket that could be used for frozenset... something like ':1,2,3:'... yuk... I dunno. Alternatively you could to the same thing we do with strings; add a prefix char for different variants; {1,2,3} is a set, f{1,2,3} is a frozen set... For Python 3000 you could extend this approach to lists and dicts; [1,2,3] is a list, f[1,2,3] is a "frozen list" or tuple, {1:'a',2:'b'} is a dict, f{1:'a',2:'b'} is a "frozen dict" which can be used as a key in other dicts... etc. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Octal literals
On Wed, 2006-02-01 at 19:09 +, M J Fleming wrote: > On Wed, Feb 01, 2006 at 01:35:14PM -0500, Barry Warsaw wrote: > > The proposal for something like 0xff, 0o664, and 0b1001001 seems like > > the right direction, although 'o' for octal literal looks kind of funky. > > Maybe 'c' for oCtal? (remember it's 'x' for heXadecimal). > > > > -Barry > > > > +1 +1 too. It seems like a "least changes" way to fix the IMHO strange 0123 != 123 behaviour. Any sort of arbitrary base syntax is overkill; decimal, hexadecimal, octal, and binary cover 99.9% of cases. The 0.1% of other cases are very special, and can use int("LITERAL",base=RADIX). For me, binary is far more useful than octal, so I'd be happy to let octal languish as legacy support, but I definitely want "0b10110101". -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Pythondoc by effbot
On Sat, 2006-01-21 at 19:15 -0500, Terry Reedy wrote: > >> http://effbot.org/lib/os.path.join > > On this page, 8 of 30 entries have a 'new in' comment. For anyone with no > interest in the past, these constitute noise. I wonder if for 3.0, the Even the past is relative... I find the "new in" doco absolutely essential, because my "present" depends on what system I'm on, and some of them are stuck in a serious time-warp. I do not have a time-machine big enough to transport whole companies. > timer can be reset and the docs start clean again. To keep them backwards > compatible, they would also have to be littered with 'changed in 3.0' and > 'deleted in 3.0' entries. Better, I think, to refer people to the last 2.x > docs and a separate 2.x/3.0 changes doc. I also find "changed in" essential, but I don't mind not having "deleted in"... it encourages developers stuck in those time-warps to avoid features that get deleted in the future :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str with base
On Tue, 2006-01-17 at 20:25 -0800, Guido van Rossum wrote: > On 1/17/06, Bob Ippolito <[EMAIL PROTECTED]> wrote: > > There shouldn't be a %B for the same reason there isn't an %O or %D > > -- they're all just digits, so there's not a need for an uppercase [...] so %b is "binary", +1 > > The difference between hex() and oct() and the proposed binary() is > > I'd propose bin() to stay in line with the short abbreviated names. [...] +1 > The binary type should have a 0b prefix. [...] +1 For those who argue "who would ever use it?", I would :-) Note that this does not support and is independent of supporting arbitrary bases. I don't think we need to support arbitrary bases, but if we did I would vote for ".precision" to mean ".base" for "%d"... ie; "%3.3d" % 5 == " 12" I think supporting arbitrary bases for floats is way overkill and not worth considering. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str with base
On Tue, 2006-01-17 at 16:38 -0700, Adam Olsen wrote: > On 1/17/06, Thomas Wouters <[EMAIL PROTECTED]> wrote: > > On Tue, Jan 17, 2006 at 09:23:29AM -0500, Jason Orendorff wrote: [...] > I dream of a day when str(3.25, base=2) == '11.01'. That is the > number a float really represents. It would be so much easier to > understand why floats behave the way they do if it were possible to > print them in binary. [...] Heh... that's pretty much why I used base16 float notation when doing fixed point stuff in assembler... uses less digits than binary, but easily visualised as bits. However, I do think that we could go overboard here... I don't know that we really need arbitrary base string formatting for all numeric types. I think this is a case of "very little gained for too much added complexity". If we really do, and someone is prepared to implement it, then I think adding "@base" is the best way to do it (see my half joking post earlier). If we only want arbitrary bases for integer types, the best way would be to leverage off the existing ".precision" so that it means ".base" for "%d". > > In-favour-of-%2b-ly y'rs, > > My only opposition to this is that the byte type may want to use it. > I'd rather wait until byte is fully defined, implemented, and released > in a python version before that option is taken away. There's always "B" for bytes and "b" for bits... though I can't imagine why byte would need it's own conversion type. I'm not entirely sure everyone is on the same page for "%b" here... it would only be a shorthand for "binary" in the same way that "%x" is for "hexidecimal". It would not support arbitrary bases, and thus "%2b" would mean a binary string with minimum length of 2 characters. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str with base
On Tue, 2006-01-17 at 10:05 +, Nick Craig-Wood wrote: > On Mon, Jan 16, 2006 at 11:13:27PM -0500, Raymond Hettinger wrote: [...] > Another suggestion would be to give hex() and oct() another parameter, > base, so you'd do hex(123123123, 2). Perhaps a little > counter-intuitive, but if you were looking for base conversion > functions you'd find hex() pretty quickly and the documentation would > mention the other parameter. Ugh! I still favour extending % format strings. I really like '%b' for binary, but if arbitary bases are really wanted, then perhaps also leverage off the "precision" value for %d to indicate base such that '% 3.3d' % 5 = " 12" If people think that using "." is for "precision" and is too ambiguous for "base", you could do something like extend the whole conversion specifier to (in EBNF) [EMAIL PROTECTED] which would allow for weird things like "[EMAIL PROTECTED]" % 5.5 == " 12." Note: it is possible for floats to be represented in non-decimal number systems, its just extremely rare for anyone to do it. I have in my distant past used base 16 float notation for fixed-point numbers. I personally think %b would be adding enough. The other suggestions are just me being silly :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str with base
On Tue, 2006-01-17 at 01:03 -0500, Barry Warsaw wrote: > On Mon, 2006-01-16 at 20:49 -0800, Bob Ippolito wrote: > > > The only bases I've ever really had a good use for are 2, 8, 10, and > > 16. There are currently formatting codes for 8 (o), 10 (d, u), and > > 16 (x, X). Why not just add a string format code for unsigned > > binary? The obvious choice is probably "b". > > > > For example: > > > > >>> '%08b' % (12) > > '1100' > > >>> '%b' % (12) > > '1100' > > +1 +1 me too. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] commit of r41880 - python/trunk/Python/Python-ast.c
On Mon, 2006-01-02 at 15:16 -0800, Neal Norwitz wrote: > On 1/2/06, Barry Warsaw <[EMAIL PROTECTED]> wrote: > > I think we have a fundamental problem with Python-ast.c and > > Python-ast.h. These files should not be both auto-generated and checked > > into Subversion. > > I agree with the problem statement. > > > The general rule should be that no file that is ever generated can be > > checked into Subversion. Probably the right approach is to check in a > > template file that will not get removed by a distclean, and modify the > > build process to generate Python-ast.* from those template files. > > I'm not sure about your proposed solution, though. > > There's a bootstrapping issue. Python-ast.[ch] are generated by a > python 2.2+ script. /f created a bug report if only 2.1 is available. > > The Python-ast.[ch] should probably not be removed by distclean. This > is similar to configure. Would that make you happy? What else would > improve the current situation? > > If you go the template route, you would just copy the files. That > doesn't seem to gain anything. The solution I use is to never have anything auto-generated in CVS/SVN, but have "make dist" generate and include anything needed for bootstrapping in the distribution tarball (or whatever). Doing "make distclean" should delete enough to bring you back to a freshly extracted distribution tarball, and "make maintainer-clean" should delete all auto-generated files to bring you back to a clean CVS/SVN checkout. I tend to include quite a few generated files in the distribution tarball that are not in CVS/RCS. Things like ChangeList (generated by cvs2cl), all the autotools autogen'ed files, generated datafiles, etc. This way your source distributions don't have any bootstrap problems, but you also don't have any auto-generated files in CVS/SVN and the problems they create. It does however mean that a fresh CVS/SVN checkout does have additional build requirements above and beyond building from a source distribution. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Doc-SIG] that library reference, again
; workflow rather than the "submit bug" workflow, and maybe that will make things easier for the big picture "update and release docs" workflow. But the speed of the tool-chain has little to do with this, only the "documentation language" popularity among the developers and users. ...and if the LaTeX guys don't mind fixing bugs instead of applying patches and are handling the load... the status quo is fine by me, I'm happy not to do documentation :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When do sets shrink?
On Thu, 2005-12-29 at 17:17 +0100, Fredrik Lundh wrote: > Noam Raphael wrote: > > > I'm not saying that practically it must be used - I'm just saying that > > it can't be called a heuristic, and that it doesn't involve any "fancy > > overkill size hinting or history tracking". It actually means > > something like this: > > 1. If you want to insert and the table is full, resize the table to > > twice the current size. > > 2. If you delete and the number of elements turns out to be less than > > a quarter of the size of the table, resize the table to half of the > > current size. > > sure sounds like a heuristic algorithm to me... (as in "not guaranteed to > be optimal under all circumstances, even if it's probably quite good in all > practical cases") > > My problem with this heuristic is it doesn't work well for the probably-fairly-common use-case of; fill it, empty it, fill it, empty it, fill it...etc. As Guido pointed out, if you do have a use-case where a container gets very big, then shrinks and doesn't grow again, you can manually cleanup by creating a new container and del'ing the old one. If the implementation is changed to use this heuristic, there is no simple way to avoid the re-allocs for this use-case... (don't empty, but fill with None... ugh!). My gut feeling is this heuristic will cause more pain than it would gain... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
On Sun, 2005-12-25 at 20:38 -0800, Aahz wrote: > Guido sez in > http://mail.python.org/pipermail/python-dev/2004-July/045921.html > that it is not correct to recommend using ``file()`` instead of > ``open()``. However, because ``open()`` currently *is* an alias to > ``file()``, we end up with the following problem (verified in current > HEAD) where doing ``help(open)`` brings up the docs for ``file()``: [...] > This is confusing. I suggest that we make ``open()`` a factory function > right now. (I'll submit a bug report (and possibly a patch) after I get > agreement.) Not totally related but... way back in 2001-2002, I did some work on writing a Virtual File System interface for Python. See; http://minkirri.apana.org.au/~abo/projects/osVFS The idea was that you could import a module "vfs" as "os", and then any file operations would go through the virtual file system. I had modules for things "fakeroot", "mountable", "ftpfs" etc. The vfs module had full os functionality so it was a "drop in replacement". The one wart was open(), because it is the only filesystem operation that wasn't in the os module. At the time I worked around this by adding a vfs.file() method, and suggesting that people alias open() to vfs.file(). Note that os.open() already exists as a low-level file open function, and hence could not be used as a file-object-factory method. I'm wondering if it wouldn't be a good idea to centralise all filesystem operations into the os module, including open() or file(). Perhaps the builtin open() and file() could call os.file()... or P3K could remove the builtins... I dunno... it just felt ugly at the time that open() was the one oddity. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When do sets shrink?
On Wed, 2005-12-28 at 18:57 -0500, Raymond Hettinger wrote: [...] > What could be done is to add a test for excess dummy entries and trigger > a periodic resize operation. That would make the memory available for > other parts of the currently running script and possibly available for > the O/S. > > The downside is slowing down a fine-grained operation like pop(). For > dictionaries, this wasn't considered worth it. For sets, I made the > same design decision. It wasn't an accident. I don't plan on changing > that decision unless we find a body of real world code that would be > better-off with more frequent re-sizing. I don't think it will ever be worth it. Re-allocations that grow are expensive, as they often need to move the entire contents from the old small allocation to the new larger allocation. Re-allocations that shrink can also be expensive, or at the least increase heap fragmentation. So you want to avoid re-allocations whenever possible. The ideal size for any container is "as big as it needs to be". The best heuristic for this is probably "as big as it's ever been, or if it just got bigger than that, assume it's half way through growing". which is what Python currently does. Without some sort of fancy overkill size hinting or history tracking, that's probably as good a heuristic as you can get. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
On Thu, 2005-11-24 at 14:11 +, Duncan Grisby wrote: > Hi, > > I posted this to comp.lang.python, but got no response, so I thought I > would consult the wise people here... > > I have encountered a problem with the re module. I have a > multi-threaded program that does lots of regular expression searching, > with some relatively complex regular expressions. Occasionally, events > can conspire to mean that the re search takes minutes. That's bad > enough in and of itself, but the real problem is that the re engine > does not release the interpreter lock while it is running. All the > other threads are therefore blocked for the entire time it takes to do > the regular expression search. I don't know if this will help, but in my experience compiling re's often takes longer than matching them... are you sure that it's the match and not a compile that is taking a long time? Are you using pre-compiled re's or are you dynamically generating strings and using them? > Is there any fundamental reason why the re module cannot release the > interpreter lock, for at least some of the time it is running? The > ideal situation for me would be if it could do most of its work with > the lock released, since the software is running on a multi processor > machine that could productively do other work while the re is being > processed. Failing that, could it at least periodically release the > lock to give other threads a chance to run? > > A quick look at the code in _sre.c suggests that for most of the time, > no Python objects are being manipulated, so the interpreter lock could > be released. Has anyone tried to do that? probably not... not many people would have several-minutes-to-match re's. I suspect it would be do-able... I suggest you put together a patch and submit it on SF... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] urlparse brokenness
On Tue, 2005-11-22 at 23:04 -0600, Paul Jimenez wrote: > It is my assertion that urlparse is currently broken. Specifically, I > think that urlparse breaks an abstraction boundary with ill effect. > > In writing a mailclient, I wished to allow my users to specify their > imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which > worked fine. I then thought that the natural extension to support FWIW, I have a small addition related to this that I think would be handy to add to the urlparse module. It is a pair of functions "netlocparse()" and "netlocunparse()" that is for parsing and unparsing "user:[EMAIL PROTECTED]:port" netloc's. Feel free to use/add/ignore it... http://minkirri.apana.org.au/~abo/projects/osVFS/netlocparse.py -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Coroutines (PEP 342)
On Mon, 2005-11-14 at 15:46 -0700, Bruce Eckel wrote: [...] > What is not clear to me, and is not discussed in the PEP, is whether > coroutines can be distributed among multiple processors. If that is or > isn't possible I think it should be explained in the PEP, and I'd be > interested in know about it here (and ideally why it would or wouldn't > work). Even if different coroutines could be run on different processors, there would be nothing gained except extra overheads of interprocessor memory duplication and communication delays. The whole process communication via yield and send effectively means only one co-routine is running at a time, and all the others are blocked waiting for a yield or send. This was the whole point; it is a convenient abstraction that appears to do work in parallel, while actually doing it sequentially, avoiding the overheads and possible race conditions of threads. It has the problem that a single co-routine can monopolise execution, hence the other name "co-operative multi-tasking", where co-operation is the requirement for it to work. At least... that's the way I understood it... I could be totally mistaken... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter
On Thu, 2005-11-10 at 00:40 -0500, Michiel Jan Laurens de Hoon wrote: > Stephen J. Turnbull wrote: > > >Michiel> What is the advantage of Tk in comparison to other GUI > >Michiel> toolkits? [...] > My application doesn't need a toolkit at all. My problem is that because > of Tkinter being the standard Python toolkit, we cannot have a decent > event loop in Python. So this is the disadvantage I see in Tkinter. [...] I'm kind of surprised no-one has mentioned Twisted in this thread. Twisted is an async-framework that I believe has support for using a variety of different event-loops, including Tkinter and wxWidgets, as well as it's own. It has been heavily re-factored many times, so if you want to see the current Python "state of the art" way of doing this, I'd be having a look at what they are doing. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pythonic concurrency
On Mon, 2005-10-10 at 18:59, Bill Janssen wrote: > > The problem with threads is at first glance they appear easy... > > Anyone who thinks that a "glance" is enough to understand something is > too far gone to worry about. On the other hand, you might be > referring to a putative brokenness of the Python documentation on > Python threads. I'm not sure they're broken, though. They just point > out the threading that Python provides, for folks who want to use > threads. Are they required to provide a full course in threads? I was speaking in general, not about Python in particular. If anything, Python is one of the simplest and safest platforms for threading (thanks mostly to the GIL). And I find the documentation excellent :-) > > ...which seduces many beginning programmers into using them. > > Don't worry about this. That's how "beginning programmers" learn. Many other things "beginning programmers" learn very quickly break if you do it wrong, until you learn to do it right. Threads are tricky in that they can "mostly work", and it can be a long while before you realise it is actually broken. I don't know how many bits of other people's code I've had to fix that worked for years until it was run on hardware fast enough to trigger that nasty race condition :-) -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pythonic concurrency
On Fri, 2005-10-07 at 17:47, Bruce Eckel wrote: > Early in this thread there was a comment to the effect that "if you > don't know how to use threads, don't use them," which I pointedly > avoided responding to because it seemed to me to simply be > inflammatory. But Ian Bicking just posted a weblog entry: > http://blog.ianbicking.org/concurrency-and-processes.html where he > says "threads aren't as hard as they imply" and "An especially poor > argument is one that tells me that I'm currently being beaten with a > stick, but apparently don't know it." The problem with threads is at first glance they appear easy, which seduces many beginning programmers into using them. The hard part is knowing when and how to lock shared resources... at first glance you don't even realise you need to do this. So many threaded applications are broken and don't know it, because this kind of broken-ness is nearly always intermittant and very hard to reproduce and debug. One common alternative is async polling frameworks like Twisted. These scare beginners away because a first glance, they appear hideously complicated. However, if you take the time to get your head around them, you get a better feel for all the nasty implications of concurrency, and end up designing better applications. This is the reason why, given a choice between an async and a threaded implementation of an application, I will always choose the async solution. Not because async is inherently better than threading, but because the programmer who bothered to grock async is more likely to get it right. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pythonic concurrency
On Fri, 2005-10-07 at 23:54, Nick Coghlan wrote: [...] > The few times I have encountered anyone saying anything resembling "threading > is easy", it was because the full sentence went something like "threading is > easy if you use message passing and copy-on-send or release-reference-on-send > to communicate between threads, and limit the shared data structures to those > required to support the messaging infrastructure". And most of the time there > was an implied "compared to using semaphores and locks directly, " at the > start. LOL! So threading is easy if you restrict inter-thread communication to message passing... and what makes multi-processing hard is your only inter-process communication mechanism is message passing :-) Sounds like yet another reason to avoid threading and use processes instead... effort spent on threading based message passing implementations could instead be spent on inter-process messaging. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GIL, Python 3, and MP vs. UP
On Tue, 2005-09-20 at 22:43, Guido van Rossum wrote: > On 9/20/05, John J Lee <[EMAIL PROTECTED]> wrote: [...] > I don't know that any chips are designed with threading in mind. Fast > threading benefits from fast context switches which benefits from > small register sets. I believe the trend is towards ever large > register sets. Also, multiple processors with shared memory don't > scall all that well; multiple processors with explicit IPC channels > scale much better. All arguments for multi-processing and against > multi-threading. Exactly! I believe the latest MP opteron chipsets use hypertransport busses to directly access the other processor's memory and possibly CPU cache. In theory this means shared memory will not hurt too badly, helping threading. However, memory contention bottlenecks and cache coherency will always mean shared memory hurts more, and will never scale better, than IPC. The reality is threads were invented as a low overhead way of easily implementing concurrent applications... ON A SINGLE PROCESSOR. Taking into account threading's limitations and objectives, Python's GIL is the best way to support threads. When hardware (seriously) moves to multiple processors, other concurrency models will start to shine. In the short term there will be various hacks to try and make the existing plethora of threading applications run better on multiple processors, but ultimately the overheads of shared memory will force serious multi-processing to use IPC channels. If you want serious MP, use processes, not threads. I see anti-GIL threads again and again. Get over it... the GIL rocks for threads :-) -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] reference counting in Py3K
On Wed, Sep 07, 2005 at 02:01:01AM -0400, Phillip J. Eby wrote: [...] > Just an FYI; Pyrex certainly makes it relatively painless to write code > that interfaces with C, but it doesn't do much for performance, and > naively-written Pyrex code can actually be slower than carefully-optimized > Python code. So, for existing modules that were written in C for > performance reasons, Pyrex isn't currently a substitute. I just want to second this; my experiments with pyrex on pysync produced no speedups. I got a much more noticable speed benefit from psyco. This was admittedly a long time ago... -- -------- Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Remove str.find in 3.0?
On Sat, 2005-08-27 at 10:16 -0700, Josiah Carlson wrote: > Guido van Rossum <[EMAIL PROTECTED]> wrote: [...] > Oh, there's a good thing to bring up; regular expressions! re.search > returns a match object on success, None on failure. With this "failure > -> Exception" idea, shouldn't they raise exceptions instead? And > goodness, defining a good regular expression can be quite hard, possibly > leading to not insignificant "my regular expression doesn't do what I > want it to do" bugs. Just look at all of those escape sequences and the > syntax! It's enough to make a new user of Python gasp. I think re.match() returning None is an example of 1b (as categorised by Terry Reedy). In this particular case a 1b style response is OK. Why; 1) any successful match evaluates to "True", and None evaluates to "False". This allows simple code like; if myreg.match(s): do something. Note you can't do this for find, as 0 is a successful "find" and evaluates to False, whereas other results including -1 evaluate to True. Even worse, -1 is a valid index. 2) exceptions are for unexpected events, where unexpected means "much less likely than other possibilities". The re.match() operation asks "does this match this", which implies you have an about even chance of not matching... ie a failure to match is not unexpected. The result None makes sense... "what match did we get? None, OK". For str.index() you are asking "give me the index of this inside this", which implies you expect it to be in there... ie not finding it _is_ unexpected, and should raise an exception. Note that re.match() returning None will raise exceptions if the rest of your code doesn't expect it; index = myreg.match(s).start() tail = s[index:] This will raise an exception if there was no match. Unlike str.find(); index = s.find(r) tail = s[index:] Which will happily return the last character if there was no match. This is why find() should return None instead of -1. > With the existance of literally thousands of uses of .find and .rfind in > the wild, any removal consideration should be weighed heavily - which > honestly doesn't seem to be the case here with the ~15 minute reply time > yesterday (just my observation and opinion). If you had been ruminating > over this previously, great, but that did not seem clear to me in your > original reply to Terry Reedy. bare in mind they are talking about Python 3.0... I think :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)
On Wed, 2005-08-24 at 07:33, "Martin v. Löwis" wrote: > Walter Dörwald wrote: > > Martin v. Löwis wrote: > > > >> Walter Dörwald wrote: [...] > Actually, on a second thought - it would not remove the quadratic > aspect. You would still copy the rest string completely on each > split. So on the first split, you copy N lines (one result line, > and N-1 lines into the rest string), on the second split, N-2 > lines, and so on, totalling N*N/2 line copies again. The only > thing you save is the join (as the rest is already joined), and > the IsLineBreak calls (which are necessary only for the first > line). [...] In the past, I've avoided the string copy overhead inherent in split() by using buffers... I've always wondered why Python didn't use buffer type tricks internally for split-type operations. I haven't looked at Python's string implementation, but the fact that strings are immutable surely means that you can safely and efficiently reference an implementation level "data" object for all strings... ie all strings are "buffers". The only problem I can see with this is huge "data" objects might hang around just because some small fragment of it is still referenced by a string. Surely a simple huristic or two like "if len(string) < len(data)/8: copy data; else: reference data" would go a long way towards avoiding that. In my limited playing around with manipulating of strings and benchmarking stuff, the biggest overhead is nearly always the copys. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: Distributed RCS
On Mon, 2005-08-15 at 04:30, Benji York wrote: > Martin v. Löwis wrote: > > [EMAIL PROTECTED] wrote: > >>Granted. What is the cost of waiting a bit longer to see if it (or > >>something else) gets more usable and would hit the mark better than svn? > > > > It depends on what "a bit" is. Waiting a month would be fine; waiting > > two years might be pointless. > > This might be too convoluted to consider, but I thought I might throw it > out there. We use svn for our repositories, but I've taken to also > using bzr so I can do local commits and reversions (within a particular > svn reversion). I can imagine expanding that usage to sharing branches > and such via bzr (or mercurial, which looks great), but keeping the > trunk in svn. Not too convoluted at all; I already do exactly this with many upstream CVS and SVN repositorys, using a local PRCS for my own branches. I'm considering switching to a distributed RCS for my own branches because it would make it easier for others to share them. I think this probably is the best solution; it gives a reliable(?) centralised RCS for the trunk, but allows distributed development. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
On Mon, 2005-08-08 at 17:51, Trent Mick wrote: [...] > [Donovan Baarda wrote] > > On Mon, 2005-08-08 at 15:49, Trent Mick wrote: [...] > You want to do checkins of code in a consisten state. Some large changes > take a couple of days to write. During which one may have to do a couple > minor things in unrelated sections of a project. Having some mechanism > to capture some thoughts and be able to say "these are the relevant I don't need to checkin in a consitent state if I'm working on a seperate branch. I can checkin any time I want to record a development checkpoint... I can capture the thoughts in the version history of the branch. > source files for this work" is handy. Creating a branch for something > that takes a couple of days is overkill. [...] > The alternative being either that you have separate branches for > everything (can be a pain) or just check-in for review (possibly It all comes down to how painless branch/merge is. Many esoteric "features" of version control systems feel like they are there to workaround the absence of proper branch/merge histories. Note: SVN doesn't have branch/merge histories either. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
On Mon, 2005-08-08 at 15:49, Trent Mick wrote: > One feature I like in Perforce (which Subversion doesn't have) is the > ability to have pending changesets. A changeset is, as with subversion, > something you check-in atomically. Pending changesets in Perforce allow > you to (1) group related files in a source tree where you might be > working on multiple things at once to ensure and (2) to build a change > description as you go. In a large source tree this can be useful for > separating chunks of work. This seems like a poor workaround for crappy branch/merge support. I'm new to perforce, but the pending changesets seem dodgey to me... you are accumulating changes gradually without recording any history during the process... ie, no checkins until the end. Even worse, perforce seems to treat clients like "unversioned branches", allowing you to review and test pending changesets in other clients. This supposedly allows people to review/test each others changes before they are committed. The problem is, since these changes are not committed, there is no firm history of what what was reviewed/tested vs what gets committed... ie they could be different. Having multiple different pending changesets in one large source tree also feels like a workaround for high client overheads. Trying to develop and test a mixture of different changes in one source tree is asking for trouble... they can interact. Maybe I just haven't grokked perforce yet... which might be considered a black mark against it's learning curve :-) For me, the logical way to group a collection of changes is in a branch. This allows you to commit and track history of the collection of changes. You check out each branch into different directories and develop/test them independantly. The branch can then be reviewed and merged when it is complete. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
Martin v. Löwis wrote: > Donovan Baarda wrote: > >>Yeah. IMHO the sadest thing about SVN is it doesn't do branch/merge >>properly. All the other cool stuff like renames etc is kinda undone by >>that. For a definition of properly, see; >> >>http://prcs.sourceforge.net/merge.html > > > Can you please elaborate? I read the page, and it seems to me that > subversion's merge command works exactly the way described on the > page. maybe it's changed since I last looked at it, but last time I looked SVN didn't track merge histories. From the svnbook; "Unfortunately, Subversion is not such a system. Like CVS, Subversion 1.0 does not yet record any information about merge operations. When you commit local modifications, the repository has no idea whether those changes came from running svn merge, or from just hand-editing the files." What this means is SVN has no way of automatically identifying the common version. An svn merge requires you to manually identify and specify the "last common point" where the branch was created or last merged. PRCS automatically finds the common version from the branch/merge history, and even remembers the merge/replace/nothing/delete decision you make for each file as the default to use for future merges. You can see this in the command line differences. For subversion; # create and checkout branch my-calc-branch $ svn copy http://svn.example.com/repos/calc/trunk \ http://svn.example.com/repos/calc/branches/my-calc-branch \ -m "Creating a private branch of /calc/trunk." $ svn checkout http://svn.example.com/repos/calc/branches/my-calc-branch # merge and commit changes from trunk $ svn merge -r 341:HEAD http://svn.example.com/repos/calc/trunk $ svn commit -m "Merged trunc changes to my-calc-branch." # merge and commit more changes from trunk $ svn merge -r 345:HEAD http://svn.example.com/repos/calc/trunk $ svn commit -m "Merged trunc changes to my-calc-branch." Note that 341 and 345 are "magic" version numbers which correspond to the trunc version at the time of branch and first merge respectively. It is up to the user to figure out these versions using either meticulous use of tags or svn logs. In PRCS; # create and checkout branch my-calc-branch $ prcs checkout calc -r 0 $ prcs checkin -r my-calc-branch -m "Creating my-calc-branch" # merge and commit changes from trunk $ prcs merge -r 0 $ prcs checkin -m " merged changes from trunk" # merge and commit more changes from trunk $ prcs merge -r 0 $ prcs checkin -m " merged changes from trunk" Note that "-R 0" means "HEAD of trunk branch", and "-r my-calc-branch" means "HEAD of my-calc-branch". There is no need to figure out what versions of those branches to use as the "changes from" point, because PRCS figures it out for you. Not only that, but if you chose to ignore changes in certain files during the first merge, the second merge will remember that as the default action for the second merge. -- Donovan Baarda ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Syscall Proxying in Python
On Tue, 2005-08-02 at 11:59, Gabriel Becedillas wrote: > Donovan Baarda wrote: [...] > > Wow... you guys sure did it the hard way. If you had done it at the > > Python level, you would have had a much easier time of both implementing > > and updating it. [...] > Hi, thanks for your reply. > The problem I see with the aproach you're sugesting is that I have to > rewrite a lot of code to make it work the way I want. We allready have > the syscall proxying stuff with an stdio layer on top of it. I should > have to rewrite some parts of some modules and use my own versions of > stdio functions, and that is pretty much the same as we have done before. > There are also native objects that use stdio functions, and I should > replace those ones too, or modules that have some native code that uses > stdio, or sockets. I should duplicate those files, and make the same > kind of search/replace work that we have done previously and that we'd > like to avoid. > Please let me know if I misunderstood you. Nope... you got it all figured out. I guess it depends on what degree of "proxying" you want... I thought there was some stuff you wanted re-directed, and some you didn't. The point is, you _can_ do this at the Python level, and you only have to modify Python code, not C Python source. However, if you want to proxy everything, then the glib wrapper is probably the best approach, provided you really want to code in C and have your own Python binary. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
On Tue, 2005-08-02 at 09:06, François Pinard wrote: > [Raymond Hettinger] > > > >http://www.venge.net/monotone/ > > > The current release is 0.21 which suggests that it is not ready for > > primetime. > > It suggests it, yes, and to me as well. On the other hand, there is > a common prejudice that something requires many releases, or frequent > releases, to be qualified as good. While it might be true on average, > this is not necessarily true: some packages need not so many steps for > becoming very usable, mature or stable. (Note that I'm not asserting > anything about Monotone, here.) We should merely keep an open mind. It is true that some well designed/developed software becomes reliable very quicky. However, it still takes heavy use over time to prove that. You don't want to be the guy who finds out that this is not one of those bits of software. IMHO you need maturity for revision control software... you are relying on it for history. The only open source options worth considering for Python are CVS and SVN, and even SVN is questionable (see bdb backend issues). -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Syscall Proxying in Python
On Mon, 2005-08-01 at 10:36, Gabriel Becedillas wrote: > Hi, > We embbeded Python 2.0.1 in our product a few years ago and we'd like to > upgrade to Python 2.4.1. This was not a simple task, because we needed > to execute syscalls on a remote host. We modified Python's source code > in severall places to call our own versions of some functions. For > example, instead of calling fopen(...), the source code was modified to > call remote_fopen(...), and the same was done with other libc functions. > Socket functions where hooked too (we modified socket.c), Windows > Registry functions, etc.. Wow... you guys sure did it the hard way. If you had done it at the Python level, you would have had a much easier time of both implementing and updating it. As an example, have a look at my osVFS stuff. This is a replacement for the os module and open() that tricks Python into using a virtual file system; http://minkirri.apana.org.au/~abo/projects/osVFS -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
On Sun, 2005-07-31 at 23:54, Stephen J. Turnbull wrote: > >>>>> "BAW" == Barry Warsaw <[EMAIL PROTECTED]> writes: > > BAW> So are you saying that moving to svn will let us do more long > BAW> lived branches? Yay! > > Yes, but you still have to be disciplined about it. svn is not much > better than cvs about detecting and ignoring spurious conflicts due to > code that gets merged from branch A to branch B, then back to branch > A. Unrestricted cherry-picking is still out. Yeah. IMHO the sadest thing about SVN is it doesn't do branch/merge properly. All the other cool stuff like renames etc is kinda undone by that. For a definition of properly, see; http://prcs.sourceforge.net/merge.html This is why I don't bother migrating any existing CVS projects to SVN; the benefits don't yet outweigh the pain of migrating. For new projects sure, SVN is a better choice than CVS. -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
On Mon, 2005-06-27 at 14:25, Phillip J. Eby wrote: [...] > As for the open issues, if we can't reach some sane compromise about > atime/ctime/mtime, I'd suggest just providing the stat() method and let > people use stat().st_mtime et al. Alternately, I'd be okay with creating > last_modified(), last_accessed(), and created_on() methods that return > datetime objects, as long as there's also atime()/mtime()/ctime() methods > that return timestamps. +1 for atime/mtime/ctime being timestamps -1 for redundant duplicates that return DateTimes +1 for a stat() method (there is lots of other goodies in a stat). -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Recommend accepting PEP 312 --Simple Implicit Lambda
Josiah Carlson wrote: > Donovan Baarda <[EMAIL PROTECTED]> wrote: > >>Nick Coghlan wrote: >> >>>Donovan Baarda wrote: [...] >>But isn't a function just a deferred expression with a name :-) > > > A function in Python is actually a deferred sequence of statements and > expressions. An anonymous function in Python (a lambda) is a deferred > expression. in the end though, a sequence of statements that completes with a "return value" is, when treated as a black box, indistinguishable from an expression. Originally I thought that this also had to be qualified with "and has no side-effects", but I see now that is not the case. [...] >>Oh yeah Raymond: on the "def defines some variable name"... are you >>joking? You forgot the smiley :-) > > > 'def' happens to bind the name that follows the def to the function with > the arguments and body following the name. Yeah, but we don't use "def" to bind arbitary variables, only functions/procedures. So in python, they are intimately identified with functions and procedures. >>I don't get what the problem is with mixing statement and expression >>semantics... from a practial point of view, statements just offer a >>superset of expression functionality. > > > Statements don't have a return value. To be more precise, what is the > value of "for i in xrange(10): z.append(...)"? Examine the selection of > statements available to Python, and ask that question. The only one > that MAY have a return value, is 'return' itself, which really requires > an expression to the right (which passes the expression to the right to > the caller's frame). When you have statements that ultimately need a > 'return' for a return value; you may as well use a standard function > definition. Hmmm. For some reason I thought that these kind of things would have a return value of None, the same as a function without an explicit return. I see now that this is not true... >>If there really is a serious practical reason why they must be limited >>to expressions, why not just raise an exception or something if the >>"anonymous function" is too complicated... > > > Define "too complicated"? I was thinking that this is up to the interpreter... depending on what the practical limitations are that cause the limitation in the first place. For example... if it can't be reduced to an "expression" through simple transforms. But look... I've gone and created another monster thread on "alternatives to lambda"... I'm going to shut up now. -- Donovan Baarda ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Recommend accepting PEP 312 -- Simple Implicit Lambda
Nick Coghlan wrote: > Donovan Baarda wrote: > >>As I see it, a lambda is an anonymous function. An anonymous function is >>a function without a name. > > > And here we see why I'm such a fan of the term 'deferred expression' > instead of 'anonymous function'. But isn't a function just a deferred expression with a name :-) As a person who started out writing assembler where every "function" I wrote was a macro that got expanded inline, the distiction is kinda blurry to me. > Python's lambda expressions *are* the former, but they are > emphatically *not* the latter. Isn't that because lambda's have the limitation of not allowing statements, only expressions? I know this limitation avoids side-effects and has significance in some formal (functional?) languages... but is that what Python is? In the Python I use, lambda's are always used where you are too lazy to define a function to do it's job. To me, anonymous procedures/functions would be a superset of "deferred expressions", and if the one stone fits perfectly in the slingshot we have and can kill multiple birds... why hunt for another stone? Oh yeah Raymond: on the "def defines some variable name"... are you joking? You forgot the smiley :-) I don't get what the problem is with mixing statement and expression semantics... from a practial point of view, statements just offer a superset of expression functionality. If there really is a serious practical reason why they must be limited to expressions, why not just raise an exception or something if the "anonymous function" is too complicated... I did some fiddling and it seems lambda's can call methods and stuff that can have side effects, which kinda defeats what I thought was the point of "statements vs expressions"... I guess I just don't understand... maybe I'm just thick :-) > Anyway, the AlternateLambdaSyntax Wiki page has a couple of relevant > entries under 'real closures'. Where is that wiki BTW? I remember looking at it ages ago but can't find the link anymore. -- Donovan Baarda ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Recommend accepting PEP 312 -- Simple Implicit Lambda
Kay Schluehr wrote: > Josiah Carlson wrote: > > > Kay Schluehr <[EMAIL PROTECTED]> wrote: > > > > > >> Maybe anonymus function closures should be pushed forward right now > not only syntactically? Personally I could live with lambda or several > >> of the alternative syntaxes listed on the wiki page. I must admit I ended up deleting most of the "alternative to lambda" threads after they flooded my in box. So it is with some dread I post this, contributing to it... As I see it, a lambda is an anonymous function. An anonymous function is a function without a name. We already have a syntax for a function... why not use it. ie: f = filter(def (a): return a > 1, [1,2,3]) The implications of this are that both functions and procedures can be anonymous. This also implies that unlike lamba's, anonymous functions can have statements, not just expressions. You can even do compound stuff like; f = filter(def (a): b=a+1; return b>1, [1,2,3]) or if you want you can use indenting; f = filter(def (a): b=a+1 return b>1, [1,2,3]) It also means the following becomes valid syntax; f = def (a,b): return a>b I'm not sure if there are syntactic ambiguities to this. I'm not sure if the CS boffins are disturbed by "side effects" from statements. Perhaps both can be resolved by limiting annonymous functions to expressions. Or require use of brackets or ";" to resolve ambiguity. This must have been proposed already and shot down in flames... sorry for re-visiting old stuff and contributing noise. -- Donovan Baarda ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Withdrawn PEP 288 and thoughts on PEP 342
On Fri, 2005-06-17 at 13:53, Joachim Koenig-Baltes wrote: [...] > My use case for this is a directory tree walking generator that > yields all the files including the directories in a depth first manner. > If a directory satisfies a condition (determined by the caller) the > generator shall not descend into it. > > Something like: > > DONOTDESCEND=1 > for path in mywalk("/usr/src"): > if os.path.isdir(path) and os.path.basename(path) == "CVS": > continue DONOTDESCEND > # do something with path > > Of course there are different solutions to this problem with callbacks > or filters but i like this one as the most elegant. I have implemented almost exactly this use-case using the standard Python generators, and shudder at the complexity something like this would introduce. For me, the right solution would be to either write your own generator that "wraps" the other generator and filters it, or just make the generator with additional (default value) parameters that support the DONOTDECEND filtering. FWIW, my usecase is a directory comparison generator that walks two directorys producing tuples of corresponding files. It optionally will not decend directories in either tree that do not have a corresponding directory in the other tree. See; http://minkirri.apana.org.au/~abo/projects/utils/ -- Donovan Baarda <[EMAIL PROTECTED]> ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: switch statement
On Mon, 2005-04-25 at 21:21 -0400, Brian Beck wrote: > Donovan Baarda wrote: > > Agreed. I don't find any switch syntaxes better than if/elif/else. Speed > > benefits belong in implementation optimisations, not new bad syntax. > > I posted this 'switch' recipe to the Cookbook this morning, it saves > some typing over the if/elif/else construction, and people seemed to > like it. Take a look: > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410692 Very clever... you have shown that current python syntax is capable of almost exactly replicating a C case statement. My only problem is C case statements are ugly. A simple if/elif/else is much more understandable to me. The main benefit in C of case statements is the compiler can optimise them. This copy of a C case statement will be slower than an if/elif/else, and just as ugly :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: switch statement
On Mon, 2005-04-25 at 18:20 -0400, Jim Jewett wrote: [...] > If speed for a limited number of cases is the only advantage, > then I would say it belongs in (at most) the implementation, > rather than the language spec. Agreed. I don't find any switch syntaxes better than if/elif/else. Speed benefits belong in implementation optimisations, not new bad syntax. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.
On Mon, 2005-03-21 at 23:31 +1100, Donovan Baarda wrote: > On Mon, 2005-03-21 at 11:42 +0100, Peter Astrand wrote: > > On Mon, 21 Mar 2005, Donovan Baarda wrote: > > > > > > > The only ways to ensure that a select process does not block like > > > > > this, > > > > > without using non-blocking mode, are; > > > > > > 3) Use os.read / os.write. > > > [...] > > > > > > but os.read / os.write will block too. > > > > No. > [...] > > Hmmm... you are right... that changes things. Blocking vs non-blocking > becomes kinda moot if read/write will do partial writes in blocking > mode. > > > fread() should loop internally on EAGAIN, in blocking mode. > > Yeah, I was talking about non-blocking mode... Actually, in blocking mode you never get EAGAIN read() only gets EAGAIN on an empty non-blocking read(). In non-blocking mode, EAGAIN is considered an error by fread(), so it will return a partial read. The python implementation of file.read() will return this partial read, and clear the EAGAIN error, or raise IOError if it was an empty read (to differentiate between an empty read and EOF). -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.
On Tue, 2005-03-22 at 12:49 +1200, Greg Ewing wrote: > Donovan Baarda wrote: > > > Consider the following. This is pretty much the only way you can use > > popen2 reliably without knowing specific behaviours of the executed > > command; > > > > ... > > fcntl.fcntl(child_in, fcntl.F_SETFL, flags | os.O_NONBLOCK) # \ > > ... # / > > fcntl.fcntl(child_out, fcntl.F_SETFL, flags | os.O_NONBLOCK)# \ > > I still don't believe you need to make these non-blocking. > When select() returns a fd for reading/writing, it's telling > you that the next os.read/os.write call on it will not block. > Making the fd non-blocking as well is unnecessary and perhaps > even undesirable. Yeah... For some reason I had it in my head that os.read/os.write would not do partial/incomplete reads/writes unless the file was in non-blocking mode. > > For 1) and 2), note that popen2 returns file objects, but as they cannot > > be reliably used as file objects, we ignore them and grab their > > fileno(). Why does popen2 return file objects if they cannot reliably be > > used? > > I would go along with giving file objects alternative read/write > methods which behave more like os.read/os.write, maybe called > something like readsome() and writesome(). That would eliminate > the need to extract and manipulate the fds, and might make it > possible to do some of this stuff in a more platform-independent > way. The fact that partial reads/writes are possible without non-blocking mode changes things a fair bit. Also, the lack of fnctl support in Windows needs to be taken into account too. I still think the support for partial reads in non-blocking mode on file.read() is inconsistent with the absence of partial write support in file.write(). I think this PEP still has some merit for cleaning up this inconsistency, but otherwise doesn't gain much... just adding a return count to file.write() and clearing up the documented behaviour is enough to do this. The lack of support on win32 for non-blocking mode, combined with the reduced need for it, makes adding a "setblocking" method undesirable. I don't know what the best thing to do now is... I guess the readsome/writesome is probably best, but given that os.read/os.write is not that bad, perhaps it's best to just forget I even suggested this PEP :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.
On Mon, 2005-03-21 at 11:42 +0100, Peter Astrand wrote: > On Mon, 21 Mar 2005, Donovan Baarda wrote: > > > > > The only ways to ensure that a select process does not block like this, > > > > without using non-blocking mode, are; > > > > 3) Use os.read / os.write. > > [...] > > > > but os.read / os.write will block too. > > No. [...] Hmmm... you are right... that changes things. Blocking vs non-blocking becomes kinda moot if read/write will do partial writes in blocking mode. > fread() should loop internally on EAGAIN, in blocking mode. Yeah, I was talking about non-blocking mode... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP to make file objects support non-blockingmode.
G'day, From: "Greg Ward" <[EMAIL PROTECTED]> > On 18 March 2005, Donovan Baarda said: [...] > > Currently the built in file type does not support non-blocking mode very > > well. Setting a file into non-blocking mode and reading or writing to it > > can only be done reliably by operating on the file.fileno() file descriptor. > > This requires using the fnctl and os module file descriptor manipulation > > methods. > > Is having to use fcntl and os really so awful? At least it requires > the programmer to prove he knows what he's doing putting this file > into non-blocking mode, and that he really wants to do it. ;-) It's not that bad I guess... but then I'm proposing a very minor change to fix it. The bit that annoys me is popen2() and select() give this false sense of "File Object compatability", when in reality you can't use them reliably with file objects. It is also kind of disturbing that file.read() actually does work in non-blocking mode, but file.write() doesn't. The source for file.read() shows a fair bit of effort towards making it work for non-blocking mode... why not do the same for file.write()? > > Details > > === > > > > The documentation of file.read() warns; "Also note that when in non-blocking > > mode, less data than what was requested may be returned, even if no size > > parameter was given". An empty string is returned to indicate an EOF > > condition. It is possible that file.read() in non-blocking mode will not > > produce any data before EOF is reached. Currently there is no documented > > way to identify the difference between reaching EOF and an empty > > non-blocking read. > > > > The documented behaviour of file.write() in non-blocking mode is undefined. > > When writing to a file in non-blocking mode, it is possible that not all of > > the data gets written. Currently there is no documented way of handling or > > indicating a partial write. > > That's more interesting and a better motivation for this PEP. The other solution to this of course is to simply say "file.read() and file.write() don't work in non-blocking mode", but that would be a step backwards for the current file.read(). > > file.read([size]) Changes > > -- > > > > The read method's current behaviour needs to be documented, so its actual > > behaviour can be used to differentiate between an empty non-blocking read, > > and EOF. This means recording that IOError(EAGAIN) is raised for an empty > > non-blocking read. > > > > > > file.write(str) Changes > > > > > > The write method needs to have a useful behaviour for partial non-blocking > > writes defined, implemented, and documented. This includes returning how > > many bytes of "str" are successfully written, and raising IOError(EAGAIN) > > for an unsuccessful write (one that failed to write anything). > > Proposing semantic changes to file.read() and write() is bound to > raise hackles. One idea for soothing such objections: only make these > changes active when setblocking(False) is in effect. I.e., a > setblocking(True) file (the default, right?) behaves as you described > above, warts and all. (So old code that uses fcntl() continues to > "work" as before.) But files that have had setblocking(False) called > could gain these new semantics that you propose. There is nothing in this proposal that would break or change the behaviour of any existing code, unless it was relying on file.write() returning None. or checking that file objects don't have a "setblocking" method. Note that the change for file.read() is simply to document the current behaviour... not to actually change it. The change for file.write() is a little more dramatic, but I really can't imagine anyone relying on file.write() returning None. A compromise would be to have file.write() return None in blocking mode, and a count in non-blocking mode... but I still can't believe people will rely on it returning None :-) It would be more useful to always return a count, so that methods using them could handle both modes easily. Note that I did consider some more dramatic changes that would have made them even easier to use. Things like raising an exception for EOF instead of EAGAIN would actually make a lot of things easier to code... but it would be too big a change. Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.
G'day, From: "Peter Astrand" <[EMAIL PROTECTED]> > On Mon, 21 Mar 2005, Donovan Baarda wrote: [...] > This is no "trap". When select() indicates that you can write or read, it > means that you can write or read at least one byte. The .read() and > .write() file methods, however, always writes and reads *everything*. > These works, basically, just like fread()/fwrite(). yep, which is why you can only use them reliably in a select loop if you read/write one byte at a time. > > The only ways to ensure that a select process does not block like this, > > without using non-blocking mode, are; > > > > 1) use a buffer size of 1 in the select process. > > > > 2) understand the child process's read/write behaviour and adjust the > > selector process accordingly... ie by making the buffer sizes just right > > for the child process, > > 3) Use os.read / os.write. [...] but os.read / os.write will block too. Try it... replace the file read/writes in selector.py. They will only do partial reads if the file is put into non-blocking mode. > > I think the fread/fwrite and read/write behaviour is posix standard and > > possibly C standard stuff... so it _should_ be the same on other > > platforms. > > Sorry if I've misunderstood your point, but fread()/fwrite() does not > return EAGAIN. no, fread()/fwrite() will return 0 if nothing was read/written, and ferror() will return EAGAIN to indicated that it was a "would block" condition at least I think it does... the man page simply says ferror() returns a non-zero value. Looking at the python implementation of file.read(), for an empty fread() where ferror() is non zero, it only raises IOError if errno is not EAGAIN or EWOULDBLOCK. It blindly clearerr()'s for any other partial read. The implementation of file.write() raises IOError whenever there is an incomplete write. So it looks, as I pointed out in the draft PEP, that the current file.read() supports non-blocking mode, but file.write() doesn't... a bit asymmetric :-) Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.
On Mon, 2005-03-21 at 17:32 +1200, Greg Ewing wrote: > > On 18 March 2005, Donovan Baarda said: > > >>Many Python library methods and classes like select.select(), os.popen2(), > >>and subprocess.Popen() return and/or operate on builtin file objects. > >>However even simple applications of these methods and classes require the > >>files to be in non-blocking mode. > > I don't agree with that. There's no need to use non-blocking > I/O when using select(), and in fact things are less confusing > if you don't. You would think that... and the fact that select, popen2 etc all use file objects encourage you to think that. However, this is a trap that can catch you out badly. Check the attached python scripts that demonstrate the problem. Because staller.py outputs and flushes a fragment of data smaller than selector.py uses for its reads, the select statement is triggered, but the corresponding read blocks. A similar thing can happen with writes... if the child process consumes a fragment smaller than the write buffer of the selector process, then the select can trigger and the corresponding write can block because there is not enough space in the file buffer. The only ways to ensure that a select process does not block like this, without using non-blocking mode, are; 1) use a buffer size of 1 in the select process. 2) understand the child process's read/write behaviour and adjust the selector process accordingly... ie by making the buffer sizes just right for the child process, Note that it all interacts with the file objects buffer sizes too... making for some extremely hard to debug intermittent behaviour. > >>The read method's current behaviour needs to be documented, so its actual > >>behaviour can be used to differentiate between an empty non-blocking read, > >>and EOF. This means recording that IOError(EAGAIN) is raised for an empty > >>non-blocking read. > > Isn't that unix-specific? The file object is supposed to > provide a more or less platform-independent interface, I > thought. I think the fread/fwrite and read/write behaviour is posix standard and possibly C standard stuff... so it _should_ be the same on other platforms. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ selector.py Description: application/python staller.py Description: application/python ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.
On Fri, 2005-03-18 at 20:41 -0500, James Y Knight wrote: > On Mar 18, 2005, at 8:19 PM, Greg Ward wrote: > > Is having to use fcntl and os really so awful? At least it requires > > the programmer to prove he knows what he's doing putting this file > > into non-blocking mode, and that he really wants to do it. ;-) Consider the following. This is pretty much the only way you can use popen2 reliably without knowing specific behaviours of the executed command; import os,fnctl,select def process_data(cmd,data): child_in, child_out = os.popen2(cmd) child_in = child_in.fileno()# / flags = fcntl.fcntl(child_in, fcntl.F_GETFL)# |1) fcntl.fcntl(child_in, fcntl.F_SETFL, flags | os.O_NONBLOCK) # \ child_out = child_out.fileno() # / flags = fcntl.fcntl(child_out, fcntl.F_GETFL) # |2) fcntl.fcntl(child_out, fcntl.F_SETFL, flags | os.O_NONBLOCK)# \ ans = "" li = [child_out] lo = [child_in] while li or lo: i,o,e = select.select(li,lo,[]) # 3 if i: buf = os.read(child_out,2048) # 4 if buf: ans += buf else: li=[] if o: if data: count=os.write(child_in,data[:2048]) # 4 data = data[count:] else: lo=[] return ans For 1) and 2), note that popen2 returns file objects, but as they cannot be reliably used as file objects, we ignore them and grab their fileno(). Why does popen2 return file objects if they cannot reliably be used? The flags get/set using fnctl is arcane stuff for what is pretty much essential operations after a popen2. These could be replaced by; child_in.setblocking(False) child_out.setblocking(False) For 3), select() can operate on file objects directly. However, since you cannot reliably read/write file objects in non-blocking mode, we use the fileno's. Why can select operate with file objects if file objects cannot be reliably read/written? For 4), we are using the os.read/write methods to operate on the fileno's. Under the proposed PEP we could use the file objects read/write methods instead. I guess the thing that annoys me the most is the asymmetry of popen2 and select using file objects, but needing to use the os.read/write and fileno()'s for reading and writing. > I'd tend to agree. :) Moreover, I don't think fread/fwrite are > guaranteed to work as you would expect with non-blocking file > descriptors. So, providing a setblocking() call to files would require > calling read/write instead of fread/fwrite in all the file methods, at > least when in non-blocking mode. I don't think that's a good idea. Hmm.. I assumed file.read() and file.write() were implemented using read/write from their observed behaviour. The documentation of fread/fwrite doesn't mention the behaviour in non-blocking mode at all. The observed behaviour suggests that fread/fwrite are implemented using read/write and hence get the same behaviour. The documentation implies that the behaviour in non-blocking mode will reflect the behaviour of read/write, with EAGAIN errors reported via ferror() indicating empty non-blocking reads/writes. If the behaviour of fread/fwrite is indeed indeterminate under non-blocking mode, then yes, file objects in non-blocking mode would have to use read/write instead of fread/fwrite. However, I don't think this is required. I know this PEP is kinda insignificant and minor. It doesn't save much, but it doesn't change much, and makes things a bit cleaner. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Draft PEP to make file objects support non-blocking mode.
G'day, the recent thread about thread semantics for file objects reminded me I had a draft pep for extending file objects to support non-blocking mode. This is handy for handling files in async applications (the non-threaded way of doing things concurrently). Its pretty rough, but if I fuss over it any more I'll never get it out... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ PEP: XXX Title: Make builtin file objects support non-blocking mode Version: $Revision: 1.0 $ Last-Modified: $Date: 2005/03/18 11:34:00 $ Author: Donovan Baarda <[EMAIL PROTECTED]> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 06-Jan-2005 Python-Version: 3.5 Post-History: 06-Jan-2005 Abstract This PEP suggests a way that the existing builtin file type could be extended to better support non-blocking read and write modes required for asynchronous applications using things like select and popen2. Rationale = Many Python library methods and classes like select.select(), os.popen2(), and subprocess.Popen() return and/or operate on builtin file objects. However even simple applications of these methods and classes require the files to be in non-blocking mode. Currently the built in file type does not support non-blocking mode very well. Setting a file into non-blocking mode and reading or writing to it can only be done reliably by operating on the file.fileno() file descriptor. This requires using the fnctl and os module file descriptor manipulation methods. Details === The documentation of file.read() warns; "Also note that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given". An empty string is returned to indicate an EOF condition. It is possible that file.read() in non-blocking mode will not produce any data before EOF is reached. Currently there is no documented way to identify the difference between reaching EOF and an empty non-blocking read. The documented behaviour of file.write() in non-blocking mode is undefined. When writing to a file in non-blocking mode, it is possible that not all of the data gets written. Currently there is no documented way of handling or indicating a partial write. The file.read() and file.write() methods are implemented using the underlying C read() and write() fuctions. As a side effect of this, they have the following undocumented behaviour when operating on non-blocking files; A file.write() that fails to write all the provided data immediately will write part of the data, then raise IOError with an errno of EAGAIN. There is no indication how much of the data was successfully written. A file.read() that fails to read all the requested data immediately will return the partial data that was read. A file.read() that fails to read any data immediately will raise IOError with an errno of EAGAIN. Proposed Changes What is required is to add a setblocking() method that simplifies setting non-blocking mode, and extending/documenting read() and write() so they can be reliably used in non-blocking mode. file.setblocking(flag) Extension This method implements the socket.setblocking() method for file objects. if flag is 0, the file is set to non-blocking, else to blocking mode. file.read([size]) Changes -- The read method's current behaviour needs to be documented, so its actual behaviour can be used to differentiate between an empty non-blocking read, and EOF. This means recording that IOError(EAGAIN) is raised for an empty non-blocking read. file.write(str) Changes The write method needs to have a useful behaviour for partial non-blocking writes defined, implemented, and documented. This includes returning how many bytes of "str" are successfully written, and raising IOError(EAGAIN) for an unsuccessful write (one that failed to write anything). Impact of Changes = As these changes are primarily extensions, they should not have much impact on any existing code. The file.read() changes are only documenting current behaviour. This could have no impact on any existing code. The file.write() change makes this method return an int instead of returning nothing (None). The only code this could affect would be something relying on file.write() returning None. I suspect there is no code that would do this. The file.setblocking() change adds a new method. The only existing code this could affect is code that checks for the presense/absense of a setblocking method on a file. There may be code out there that does this to differentiate between a file and a socket. As there are much better ways to do this, I suspect that there would be no code that does this. Examples For example, the following simple code using popen2 will "hang" if the huge_in string
Re: [Python-Dev] Re: No new features
G'day again, From: "Michael Hudson" <[EMAIL PROTECTED]> > "Donovan Baarda" <[EMAIL PROTECTED]> writes: > > > > > Just my 2c; > > > > I don't mind new features in minor releases, provided they meet the > > following two criteria; > > > > 1) Don't break the old API! The new features must be pure extensions that in > > no way change the old API. Any existing code should be un-effected in any > > way by the change. > > > > 2) The new features must be clearly documented as "New in version X.Y.Z". > > This way people using these features will know the minium Python version > > required for their application. > > No no no! The point of what Anthony is saying, as I read it, is that > experience suggests it is exactly this sort of change that should be > avoided. Consider the case of Mac OS X 10.2 which came with Python > 2.2.0: this was pretty broken anyway because of some apple snafus but > it was made even more useless by the fact that people out in the wild > were writing code for 2.2.1 and using True/False/bool. Going from > 2.x.y to 2.x.y+1 shouldn't break anything, going from 2.x.y+1 to 2.x.y > shouldn't break anything that doesn't whack into a bug in 2.x.y -- and > "not having bool" isn't a bug in this sense. You missed the "minor releases" bit in my post. major releases, ie 2.x -> 3.0, are for things that can break existing code. They change the API so that things that run on 2.x may not work with 3.x. minor releases, ie 2.2.x ->2.3.0, are for things that cannot break existing code. They can extend the API such that code for 2.3.x may not work on 2.2.x, but code that runs on 2.2.x must work on 2.3.x. micro releases, ie 2.2.1 ->2.2.2, are for bug fixes only. There can be no changes to the API, so that all code that runs on 2.2.2 should work with 2.2.1, barring the bugs fixed. The example you cited of adding bool was an extension to the API, and hence should have been a minor release, not a micro release. I just read the PEP-6, and it doesn't seem to use this terminology, or make this distinction... does something else do this anywhere? I thought this approach was common knowledge... Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: No new features (was Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules ossaudiodev.c, 1.35, 1.36)
G'day, From: "Anthony Baxter" <[EMAIL PROTECTED]> > On Wednesday 09 March 2005 12:21, Greg Ward wrote: > > On 09 March 2005, Anthony Baxter said (privately): > > > Thanks! I really want to keep the no-new-features thing going for > > > as long as possible, pending any AoG (Acts of Guido), of course. [...] > Initially, I was inclined to be much less anal about the no-new-features > thing. But since doing it, I've had a quite large number of people tell me how > much they appreciate this approach - vendors, large companies with huge > installed bases of Python, and also from people releasing software written > in Python. Very few people offer the counter argument as a general case - > with the obvious exception that everyone has their "just this one little > backported feature, plase!" (I'm the same - there's been times where > I've had new features I'd have loved to see in a bugfix release, just so I > could use them sooner). Just my 2c; I don't mind new features in minor releases, provided they meet the following two criteria; 1) Don't break the old API! The new features must be pure extensions that in no way change the old API. Any existing code should be un-effected in any way by the change. 2) The new features must be clearly documented as "New in version X.Y.Z". This way people using these features will know the minium Python version required for their application. Any change that breaks rule 1) must be delayed until a major release. Any change that breaks rule 2) is a documentation bug that needs fixing. Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
From: "Martin v. Löwis" <[EMAIL PROTECTED]> > Donovan Baarda wrote: > > This patch keeps the current md5c.c, md5module.c files and adds the > > following; _hashopenssl.c, hashes.py, md5.py, sha.py. > [...] > > If all we wanted to do was fix the md5 module > > If we want to fix the licensing issues with the md5 module, this patch > does not help at all, as it keeps the current md5 module (along with > its licensing issues). So any patch to solve the problem will need > to delete the code with the questionable license. It maybe half fixes it in that if Python is happy with the RSA one, they can continue to include it, and if Debian is unhappy with it, they can remove it and build against openssl. It doesn't fully fix the license problem. It is still worth considering because it doesn't make it worse, and it does allow Python to use much faster implementations and support other digest algorithms when openssl is available. > Then, the approach in the patch breaks the promise that the md5 module > is always there. It would require that OpenSSL is always there - a > promise that we cannot make (IMO). It would be better if found an alternative md5c.c. I found one that was the libmd implementation that someone mildly tweaked and then slapped an LGPL on. I have a feeling that would make the lawyers tremble more than the "public domain" libmd one, unless they are happy that someone else is prepared to wear the grief for slapping a LGPL onto something public domain. Probably the best at the moment is the sourceforge one, which is listed as having a "zlib/libpng licence". Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] builtin_id() returns negative numbers
From: "Armin Rigo" <[EMAIL PROTECTED]> > Hi Tim, > > > On Thu, Feb 17, 2005 at 01:44:11PM -0500, Tim Peters wrote: > > >256 ** struct.calcsize('P') > > > > Now if you'll just sign and fax a Zope contributor agreement, I'll > > upgrade ZODB to use this slick trick . > > I hereby donate this line of code to the public domain :-) Damn... we can't use it then! Seriously, on the Python lists there has been a discussion rejecting an md5sum implementation because the author "donated it to the public domain". Apparently lawyers have decided that you can't give code away. Intellectual charity is illegal :-) Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
On Wed, 2005-02-16 at 22:53 -0800, Gregory P. Smith wrote: > fyi - i've updated the python sha1/md5 openssl patch. it now replaces > the entire sha and md5 modules with a generic hashes module that gives > access to all of the hash algorithms supported by OpenSSL (including > appropriate legacy interface wrappers and falling back to the old code > when compiled without openssl). > > > https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470 > > I don't quite like the module name 'hashes' that i chose for the > generic interface (too close to the builtin hash() function). Other > suggestions on a module name? 'digest' comes to mind. I just had a quick look, and have these comments (psedo patch review?). Apologies for the noise on the list... DESCRIPTION === This patch keeps the current md5c.c, md5module.c files and adds the following; _hashopenssl.c, hashes.py, md5.py, sha.py. The old md5 and sha extension modules get replaced by hashes.py, md5.py, and sha.py python modules that leverage off _hash (openssl) or _md5 and _sha (no openssl) extension modules. The new _hash extension module "wraps" the high level openssl EVP interface, which uses a string parameter to indicate what type of message digest algorithm to use. The advantage of this is it makes all openssl supported digests available, and if openssl adds more, we get them for free. A disadvantage of this is it is an abstraction level above the actual md5 and sha implementations, and this may add overheads. These overheads are probably negligible compared to the actual implementation speedups. The new _md5 and _sha extension modules are simply re-named versions of the old md5 and sha modules. The hashes.py module acts as an import wrapper for _hash, and falls back to using _md5 and _sha modules if _hash is not available. It provides an EVP style API (string hash name parameter), that supports only md5 and sha hashes if openssl is not available. The new md5.py and sha.py modules simply use hash.py. COMMENTS The introduction of a "hashes" module with a new API that supports many different digests (provided openssl is available) is extending Python, not just "fixing the licenses" of md5 and sha modules. If all we wanted to do was fix the md5 module, a simpler solution would be to change the md5c.c API to match openssl's implementation, and make md5module.c use it, conditionally compiling against md5c.c or linking against openssl in setup.py. A similar approach could be used for sha, but would require stripping the sha implementation out of shamodule.c I am mildly of concerned about the namespace/filespace clutter introduced by this implementation... it feels unnecessary, as does the tangled dependencies between them. With openssl, hashes.py duplicates the functionality of _hash. Without openssl, md5.py and sha.py duplicate _md5 and _sha, via a roundabout route through hash.py. The python wrappers seem overly complicated, with things like def new(name, string=None): if string: return _hash.new(name) else: return _hash.new.(name,string) being common where the following would suffice; def new(name,string=""): return _hash.new(name,string) I think this is because _hash.new() uses an optional string parameter, but I have a feeling a C update with a zero length string is faster than this Python if. If it was a concern, the C implementation could check the value of the string length before calling update. Given the convenience methods for different hashes in hashes.py (which incidentally look like they are only available when _hash is not available... something else that needs fixing), the md5.py module could be simply coded as; from hashes import md5 new = md5 Despite all these nit-picks, it looks pretty good. It is orders of magnitude better than any of the other non-existent solutions, including the one I didn't code :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
G'day, On Sat, 2005-02-12 at 13:04 -0800, Gregory P. Smith wrote: > On Sat, Feb 12, 2005 at 08:37:21AM -0500, A.M. Kuchling wrote: > > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > > > Are there any potential problems with making the md5sum module > > > availability > > > "optional" in the same way as this? > > > > The md5 module has been a standard module for a long time; making it > > optional in the next version of Python isn't possible. We'd have to > > require OpenSSL to compile Python. > > > > I'm happy to replace the MD5 and/or SHA implementations with other > > code, provided other code with a suitable license can be found. > > > > agreed. it can not be made optional. What I'd prefer (and will do if > i find the time) is to have the md5 and sha1 module use OpenSSLs > implementations when available. Falling back to their built in ones > when openssl isn't present. That way its always there but uses the > much faster optimized openssl algorithms when they exist. So we need a fallback md5 implementation for when openssl is not available. The RSA implementation is not usable because it has an unsuitable license. Looking at this licence again, I'm not sure what the problem is. It allows you to freely modify, distribute, etc, with the only limit you must retain the RSA licence blurb. The libmd implementation cannot be used because the author tried to give it away unconditionally, and the lawyers say you can't. (dumb! dumb! dumb! someone needs to figure out a way to systematically get around this kind of stupidity, perhaps have someone in a less legally stupid country claim and re-license free code). The libmd5-rfc sourceforge project implementation <http://sourceforge.net/projects/libmd5-rfc/> looks OK. It needs to be modified to have an API identical to openssl (rename structures/functions). Then setup.py needs to be modified to use openssl if available, or fallback to the provided libmd5-rfc implementation. The SHA module is a bit different... it includes a built in SHA implementation. It might pay to strip out the implementation and give it an openssl-like API, then make shamodule.c a use it, or openssl if available. Greg Smith might have already done much of this... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c
On Sat, 2005-02-12 at 17:35 -0800, Gregory P. Smith wrote: > I've created an OpenSSL version of the sha module. trivial to modify > to be a md5 module. Its a first version with cleanup to be done and > such. being managed in the SF patch manager: > > > https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470 > > enjoy. i'll do more cleanup and work on it soon. Hmmm. I see the patch entry, but it seems to be missing the actual patch. Did you code this from scratch, or did you base it on the current md5module.c? Is it using the openssl sha interface, or the higher level EVP interface? The reason I ask is it would be pretty trivial to modify md5module.c to use the openssl API for any digest, and would be less risk than fresh-coding one. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
G'day again, From: "Gregory P. Smith" <[EMAIL PROTECTED]> > > I think it would be cleaner and simpler to modify the existing > > md5module.c to use the openssl md5 layer API (this is just a > > search/replace to change the function names). The bigger problem is > > deciding what/how/whether to include the openssl md5 implementation > > sources so that win32 can use them. > > yes, that is all i was suggesting. > > win32 python is already linked against openssl for the socket module > ssl support, having the md5 and sha1 modules depend on openssl should > not cause a problem. IANAL... I have too much common sense, so I won't argue licences :-) So is openssl already included in the Python sources, or is it just a dependency? I had a quick look and couldn't find it so it must be a dependency. Given that Python is already dependant on openssl, it makes sense to change md5sum to use it. I have a feeling that openssl internally uses md5, so this way we wont link against two different md5sum implementations. Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
G'day, From: "Bob Ippolito" <[EMAIL PROTECTED]> > On Feb 11, 2005, at 6:11 PM, Donovan Baarda wrote: [...] > > Given that Python is already dependant on openssl, it makes sense to > > change > > md5sum to use it. I have a feeling that openssl internally uses md5, > > so this > > way we wont link against two different md5sum implementations. > > It is an optional dependency that is used when present (read: not just > win32). The sources are not included with Python. Are there any potential problems with making the md5sum module availability "optional" in the same way as this? > OpenSSL does internally have an implementation of md5 (and sha1, among > other things). Yeah, I know, that's why it could be used for the md5sum module :-) What I meant was a Python application using ssl sockets and the md5sum module will effectively have two different md5sum implementations in memory. Using the openssl md5sum for the md5sum module will make it "leaner", as well as faster. Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
On Fri, 2005-02-11 at 17:15 +1100, Donovan Baarda wrote: [...] > I think it would be cleaner and simpler to modify the existing > md5module.c to use the openssl md5 layer API (this is just a > search/replace to change the function names). The bigger problem is > deciding what/how/whether to include the openssl md5 implementation > sources so that win32 can use them. Thinking about it, probably the best way is to include the libmd md5c.c modified to use the openssl API, and then use configure to check for and use openssl if it is available. That way win32 could use the provided md5c.c, and other platforms could use the faster openssl. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
On Thu, 2005-02-10 at 23:13 -0500, Bob Ippolito wrote: > On Feb 10, 2005, at 9:50 PM, Donovan Baarda wrote: > > > On Thu, 2005-02-10 at 21:30 -0500, Bob Ippolito wrote: [...] > > Only problem with this, is pyopenssl doesn't yet include any mdX or sha > > modules. > > My bad, how about M2Crypto <http://sandbox.rulemaker.net/ngps/m2/> > then? This one supports message digests and is more license compatible > with Python to boot. [...] This one does have md5 support, but the Python API is rather different from the current python md5sum API. It hooks into the slightly higher level MVP openssl layer, rather than the lower level md5 layer. Hooking into the MVP layer pretty much requires including all the openssl message digest implementations (which may or may not be a good idea). It also uses SWIG to generate the extension module. I don't think anything else in Python itself uses SWIG, so starting to use it would introduce a "Build Dependency". I think it would be cleaner and simpler to modify the existing md5module.c to use the openssl md5 layer API (this is just a search/replace to change the function names). The bigger problem is deciding what/how/whether to include the openssl md5 implementation sources so that win32 can use them. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
On Thu, 2005-02-10 at 21:30 -0500, Bob Ippolito wrote: > On Feb 10, 2005, at 9:15 PM, Donovan Baarda wrote: > > > On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote: [...] > One possible alternative would be to bring in something like PyOpenSSL > <http://pyopenssl.sourceforge.net/> and just rewrite the md5 (and sha?) > extensions as Python modules that use that API. Only problem with this, is pyopenssl doesn't yet include any mdX or sha modules. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote: > > The md5.h/md5c.c files allow "copy and use", but no modification of > > the files. There are some alternative implementations, i.e. in glibc, > > openssl, so a replacement should be sage. Any other requirements when > > considering a replacement? One thing to consider is "degree of difficulty" :-) > > Matthias > > I believe the "plan" for md5 and sha1 and such is to use the much > faster openssl versions "in the future" (based on a long thread > debating future interfaces to such things on python-dev last summer). > That'll sidestep any tedious license issue and give a better > implementation at the same time. i don't believe anyone has taken the > time to make such a patch yet. I wasn't around for that discussion. There are two viable replacements for the RSA implementation currently used; libmd <http://www.penguin.cz/~mhi/libmd/> openssl <http://www.openssl.org/>. The libmd implementation is by Colin Plumb and has the licence; "This code is in the public domain; do with it what you wish." The API is identical to the RSA implementation and BSD world's libmd and hence is a drop in replacement. This implementation is faster than the RSA implementation. The openssl implementation has an apache style license. The API is almost the same but slightly different to the RSA API, so it would require a little bit of work to make it fit. This implementation is the fastest currently available, as it includes many platform specific optimisations for a large range of platforms. Currently md5c.c is included in the python sources. The libmd implementation has a drop in replacement for md5c.c. The openssl implementation is a complicated tangle of Makefile expanded template code that would be harder to include in the Python sources. In the Linux world, openssl is starting to become ubiquitous, so not including it and statically or even dynamically linking against it is feasible. However, using Python in other lands will probably require something to be included. Long term, I think openssl is the way to go. Short term, libmd is a painless replacement that gets around the licencing issues. I have been using the libmd API stuff for md4 in librsync, and am looking at migrating to the openssl API. If people hassle me, I could probably do the openssl API migration for Python, but I'm not sure what the best approach would be to including the source in Python sources. FWIW, I also have an md4sum module and md4c.c implementation that I'm happy to contribute to Python (done for pysysnc). -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: Moving towards Python 3.0 (was Re: [Python-Dev] Speed up function calls)
On Tue, 2005-02-01 at 10:30 +1100, Donovan Baarda wrote: > On Mon, 2005-01-31 at 15:16 -0500, Nathan Binkert wrote: > > > Wouldn't it be nicer to have a facility that let you send messages > > > between processes and manage concurrency properly instead? You'll need [...] > A quick google search revealed this; > > http://www.heise.de/ct/english/98/13/140/ > > Keeping in mind the high overheads of sharing memory between CPU's, the > discussion about threads at this url seems to confirm; threads with > shared memory are hard to distribute over multiple CPU's. Different OS's > and/or thread implementations have tried (or just outright rejected) > different ways of doing it, to varying degrees of success. IMHO, the > fact that QNX doesn't distribute threads speaks volumes. Sorry for replying to my reply, but I forgot the bit that brings it all back On Topic :-) The belief that the opcode granularity thread-switch driven by the GIL is the cause of Python's threads being non-distributable is only half true. Since OS's don't distribute threads well, any attempts to "Fix Python's Threading" in an attempt to make its threads distributable is a waste of time. The only thing that this might achieve would be to reduce the latency on thread switches, maybe allowing faster response to OS events like signals. However, the complexity introduced would cause more problems than it would fix, and could easily result in worse performance, not better. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: Moving towards Python 3.0 (was Re: [Python-Dev] Speed up function calls)
On Mon, 2005-01-31 at 15:16 -0500, Nathan Binkert wrote: > > Wouldn't it be nicer to have a facility that let you send messages > > between processes and manage concurrency properly instead? You'll need > > most of this anyway to do multithreading sanely, and the benefit to the > > multiple process model is that you can scale to multiple machines, not > > just processors. For brokering data between processes on the same > > machine, you can use mapped memory if you can't afford to copy it > > around, which gives you basically all the benefits of threads with > > fewer pitfalls. > > I don't think this is an answered problem. There are plenty of > researchers on both sides of this fence. It is not been proven at all > that threads are a bad model. > > http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf or even > http://www.python.org/~jeremy/weblog/030912.html These are both threads vs events discussions (ie, threads vs an async-event handler loop). This has nearly nothing to do with multiple CPU utilisation. The real discussion for multiple CPU utilisation is threads vs processes. Once again, my knowledge of this is old and possibly out of date, but threads do not scale well on multiple CPU's because threads use shared memory between each thread. Multiple CPU hardware _can_ have physically shared memory, but it is hardware hell keeping CPU caches in sync etc. It is much easier to build a multi-CPU machine with separate memory for each CPU, and high speed communication channels between each CPU. I suspect most modern multi-CPU's use this architecture. Assuming they have the separate-memory architecture, you get much better CPU utilisation if you design your program as separate processes communicating together, not threads sharing memory. In fact, it wouldn't surprise me if most Operating Systems that support threads don't support distributing threads over multiple CPU's at all. A quick google search revealed this; http://www.heise.de/ct/english/98/13/140/ Keeping in mind the high overheads of sharing memory between CPU's, the discussion about threads at this url seems to confirm; threads with shared memory are hard to distribute over multiple CPU's. Different OS's and/or thread implementations have tried (or just outright rejected) different ways of doing it, to varying degrees of success. IMHO, the fact that QNX doesn't distribute threads speaks volumes. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Interpreter Thread Safety?
On Sat, 2005-01-29 at 00:24 +0100, "Martin v. Löwis" wrote: > Evan Jones wrote: [...] > The allocator is thread-safe in the presence of the GIL - you are > supposed to hold the GIL before entering the allocator. Due to some > unfortunate historical reasons, there is code which enters free() > without holding the GIL - and that is what the allocator specifically > deals with. Except for this single case, all callers of the allocator > are required to hold the GIL. Just curious; is that "one case" a bug that needs fixing, or is the some reason this case can't be changed to use the GIL? Surely making it mandatory for all free() calls to hold the GIL is easier than making the allocator deal with the one case where this isn't done. I like the GIL :-) so much so I'd like to see it visible at the Python level. Then you could write your own atomic methods in Python. BTW, if what Evan is hoping for concurrent threads running on different processors in a multiprocessor system, then don't :-) It's been a while since I looked at multiprocessor architectures, but I believe threading's shared memory paradigm will always be hard to distribute efficiently over multiple CPU's. If you want to run on multiple processors, use processes, not threads. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6
On Wed, 2005-01-26 at 01:53 +1100, Anthony Baxter wrote: > On Wednesday 26 January 2005 01:01, Donovan Baarda wrote: > > In this case it turns out to be "don't do exec() in a thread, because what > > you exec can have all it's signals masked". That turns out to be a hell of > > a lot of things; popen, os.command, etc. They all only work OK in a > > threaded application if what you are exec'ing doesn't use any signals. > > Yep. You just have to be aware of it. We do a bit of this at work, and we > either spool via a database table, or a directory full of spool files. > > > Actually, I've noticed that zope often has a sorta zombie "which" process > > which it spawns. I wonder it this is a stuck thread waiting for some > > signal... > > Quite likely. For the record, it seems that the java version also contributes. This problem only occurs when you have the following combination; Linux >=2.6 Python <=2.3 j2re1.4 =1.4.2.01-1 | kaffe 2:1.1.4xxx If you use Linux 2.4, it goes away. If you use Python 2.4 it goes away. If you use j2re1.4.1.01-1 it goes away. For the problem to occur the following combination needs to occur; 1) Linux uses the thread's sigmask instead of the main thread/process sigmask for the exc'ed process (ie, 2.6 does this, 2.4 doesn't). 2) Python needs to screw with the sigmask in threads (python 2.3 does, python 2.4 doesn't). 3) The exec'ed process needs to rely on threads (j2re1.4 1.4.2.01-1 does, j2re1.4 1.4.1.01-1 doesn't). It is hard to find old Debian deb's of j2re1.4 (1.4.1.01-1), and when you do, you will also need the now non-existent j2se-common 1.1 package. I don't know if this qualifies as a potential bug against j2re1.4 1.4.2.01-1. For now my solution is to roll back to the older j2re1.4. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6
G'day, From: "Anthony Baxter" <[EMAIL PROTECTED]> > On Thursday 20 January 2005 12:43, Donovan Baarda wrote: > > On Wed, 2005-01-19 at 13:37 +, Michael Hudson wrote: > > > The main oddness about python threads (before 2.3) is that they run > > > with all signals masked. You could play with a C wrapper (call > > > setprocmask, then exec fop) to see if this is what is causing the > > > problem. But please try 2.4. > > > > Python 2.4 does indeed fix the problem. Unfortunately we are using Zope > > 2.7.4, and I'm a bit wary of attempting to migrate it all from 2.3 to > > 2.4. Is there any wa this "Fix" can be back-ported to 2.3? > > It's extremely unlikely - I couldn't make myself comfortable with it > when attempting to figure out it's backportedness. While the current > behaviour on 2.3.4 is broken in some cases, I fear very much that > the new behaviour will break other (working) code - and this is > something I try very hard to avoid in a bugfix release, particularly > in one that's probably the final one of a series. > > Fundamentally, the answer is "don't do signals+threads, you will > get burned". For your application, you might want to instead try In this case it turns out to be "don't do exec() in a thread, because what you exec can have all it's signals masked". That turns out to be a hell of a lot of things; popen, os.command, etc. They all only work OK in a threaded application if what you are exec'ing doesn't use any signals. > something where you write requests to a file in a spool directory, > and have a python script that loops looking for requests, and > generates responses. This is likely to be much simpler to debug > and work with. Hmm, interprocess communications; great fun :-) And no spawning the process from within the zope application; it's gotta be a separate daemon. Actually, I've noticed that zope often has a sorta zombie "which" process which it spawns. I wonder it this is a stuck thread waiting for some signal... Donovan Baardahttp://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6
On Thu, 2005-01-20 at 14:12 +, Michael Hudson wrote: > Donovan Baarda <[EMAIL PROTECTED]> writes: > > > On Wed, 2005-01-19 at 13:37 +, Michael Hudson wrote: > >> Donovan Baarda <[EMAIL PROTECTED]> writes: [...] > >> The main oddness about python threads (before 2.3) is that they run > >> with all signals masked. You could play with a C wrapper (call > >> setprocmask, then exec fop) to see if this is what is causing the > >> problem. But please try 2.4. > > > > Python 2.4 does indeed fix the problem. > > That's good to hear. [...] I still don't understand what Linux 2.4 vs Linux 2.6 had to do with it. Reading the man pages for execve(), pthread_sigmask() and sigprocmask(), I can see some ambiguities, but mostly only if you do things they warn against (ie, use sigprocmask() instead of pthread_sigmask() in a multi-threaded app). The man page for execve() says that the new process will inherit the "Process signal mask (see sigprocmask() )". This implies to me it will inherit the mask from the main process, not the thread's signal mask. It looks like Linux 2.4 uses the signal mask of the main thread or process for the execve(), whereas Linux 2.6 uses the thread's signal mask. Given that execve() replaces the whole process, including all threads, I dunno if using the thread's mask is right. Could this be a Linux 2.6 kernel bug? > > I'm not sure what the correct behaviour should be. The fact that it > > works in python2.4 feels more like a byproduct of the thread mask change > > than correct behaviour. > > Well, getting rid of the thread mask changes was one of the goals of > the change. I gathered that... which kinda means the fact that it fixed execvp in threads is a side effect...(though I also guess it fixed a lot of other things like this too). > > To me it seems like execvp() should be setting the signal mask back > > to defaults or at least the mask of the main process before doing > > the exec. > > Possibly. I think the 2.4 change -- not fiddling the process mask at > all -- is the Right Thing, but that doesn't help 2.3 users. This has > all been discussed before at some length, on python-dev and in various > bug reports on SF. Would a simple bug-fix for 2.3 be to have os.execvp() set the mask to something sane before executing C execvp()? Given that Python does not have any visibility of the procmask... This might be a good idea regardless as it will protect against this bug resurfacing in the future if someone decides fiddling with the mask for threads is a good idea again. > In your situation, I think the simplest thing you can do is dig out an > old patch of mine that exposes sigprocmask + co to Python and either > make a custom Python incorporating the patch and use that, or put the > code from the patch into an extension module. Then before execing > fop, use the new code to set the signal mask to something sane. Not > pretty, particularly, but it should work. The extension module that exposes sigprocmask() is probably best for now... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6
On Wed, 2005-01-19 at 13:37 +, Michael Hudson wrote: > Donovan Baarda <[EMAIL PROTECTED]> writes: [...] > You've left out a very important piece of information: which version > of Python you are using. I'm guessing 2.3.4. Can you try 2.4? Debian Python2.3 (2.3.4-18), Debian kernel-image-2.6.8-1-686 (2.6.8-10), and Debian kernel-image-2.4.27-1-686 (2.4.27-6) > I'd be astonished if this is the same bug. > > The main oddness about python threads (before 2.3) is that they run > with all signals masked. You could play with a C wrapper (call > setprocmask, then exec fop) to see if this is what is causing the > problem. But please try 2.4. Python 2.4 does indeed fix the problem. Unfortunately we are using Zope 2.7.4, and I'm a bit wary of attempting to migrate it all from 2.3 to 2.4. Is there any way this "Fix" can be back-ported to 2.3? Note that this problem is being triggered when using Popen3() in a thread. Popen3() simply uses os.fork() and os.execvp(). The segfault is occurring in the excecvp'ed process. I'm sure there must be plenty of cases where this could happen. I think most people manage to avoid it because the processes they are popen'ing or exec'ing happen to not use signals. After testing a bit, it seems the fork() in Popen3 is not a contributing factor. The problem occurs whenever os.execvp() is executed in a thread. It looks like the exec'ed command inherits the masked signals from the thread. I'm not sure what the correct behaviour should be. The fact that it works in python2.4 feels more like a byproduct of the thread mask change than correct behaviour. To me it seems like execvp() should be setting the signal mask back to defaults or at least the mask of the main process before doing the exec. > > BTW, built in file objects really could use better non-blocking > > support... I've got a half-drafted PEP for it... anyone interested in > > it? > > Err, this probably should be in a different mail :) The verboseness of the attached test code because of this issue prompted that comment... so vaguely related :-) -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Strange segfault in Python threads and linux kernel 2.6
G'day, I've Cc'ed this to zope-coders as it might affect other Zope developers and it had me stumped for ages. I couldn't find anything on it anywhere, so I figured it would be good to get something into google :-). We are developing a Zope2.7 application on Debian GNU/Linux that is using fop to generate pdf's from xml-fo data. fop is a java thing, and we are using popen2.Popen3(), non-blocking mode, and select loop to write/read stdin/stdout/stderr. This was all working fine. Then over the Christmas chaos, various things on my development system were apt-get updated, and I noticed that java/fop had started segfaulting. I tried running fop with the exact same input data from the command line; it worked. I wrote a python script that invoked fop in exactly the same way as we were invoking it inside zope; it worked. It only segfaulted when invoked inside Zope. I googled and tried everything... switched from j2re1.4 to kaffe, rolled back to a previous version of python, re-built Zope, upgraded Zope from 2.7.2 to 2.7.4, nothing helped. Then I went back from a linux 2.6.8 kernel to a 2.4.27 kernel; it worked! After googling around, I found references to recent attempts to resolve some signal handling problems in Python threads. There was one post that mentioned subtle differences between how Linux 2.4 and Linux 2.6 did signals to threads. So it seems this is a problem with Python threads and Linux kernel 2.6. The attached program demonstrates that it has nothing to do with Zope. Using it to run "fop-test /usr/bin/fop http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=971213>. Is this the same bug? Should I submit a new bug report? Is there any other way I can help resolve this? BTW, built in file objects really could use better non-blocking support... I've got a half-drafted PEP for it... anyone interested in it? -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ test-fop.py Description: application/python ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com