from:"Donovan Baarda"

Re: [Python-Dev] Dropping init.py requirement for subpackages

2006-04-26 Thread Donovan Baarda

Guido van Rossum wrote:
> On 4/26/06, Barry Warsaw <[EMAIL PROTECTED]> wrote:
> 
>>On Wed, 2006-04-26 at 10:16 -0700, Guido van Rossum wrote:
>>
>>
>>>So I have a very simple proposal: keep the __init__.py requirement for
>>>top-level pacakages, but drop it for subpackages. This should be a
>>>small change. I'm hesitant to propose *anything* new for Python 2.5,
>>>so I'm proposing it for 2.6; if Neal and Anthony think this would be
>>>okay to add to 2.5, they can do so.
[...]
>>>>I'd be -1 but the remote possibility of you being burned at the stake by
>>your fellow Googlers makes me -0 :).
> 
> 
> I'm not sure I understand what your worry is.

I happen to be a Googler too, but I was a Pythonista first...

I'm -1 for minor mainly subjective reasons;

1) explicit is better than implicit. I prefer to be explicit about what 
is and isn't a module. I have plenty of "doc" and "test" and other 
directories inside python module source tree's that I don't want to be 
python modules.

2) It feels more consistant to always require it. /foo/ is a python 
package because it contains an __init__.py... so package /foo/bar/ 
should have one one too.

3) It changes things for what feels like very little gain. I've never 
had problems with it, and don't find the import exception hard to diagnose.

Note that I think the vast majority of "newbie missing __init__.py" 
problems within google occur because people are missing __init__.py at 
the root of package import tree. This change would not not solve that 
problem.

It wouldn't surprise me if this change would introduce a slew of newbies 
complaining that "I have /foo on my PYTHONPATH, why can't I import 
foo/bar/" because they're forgotten the (now) rarely required __init__.py

--
Donovan Baarda
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Default Locale, was; Re: strftime/strptime locale funnies...

2006-04-06 Thread Donovan Baarda

On Wed, 2006-04-05 at 12:13 -0700, Brett Cannon wrote:
> On 4/5/06, Donovan Baarda <[EMAIL PROTECTED]> wrote:
> > G'day,
> >
> > Just noticed on Debian (testing), Ubuntu (warty?), and RedHat (old)
> > based systems Python's time.strptime() seems to ignore the environment's
> > Locale and just uses "C".
[...]
> Beats me.  This could be a locale thing.  If I remember correctly
> Python assumes the C locale on some things.  I suspect the reason for
> this is in the locale module or libc.  But you can't even find the
> word 'locale' or 'Locale' in timemodule.c nor do I know of any calls
> that mess with the locale, so I doubt 'time' is at fault for this.

OK, I've found and confirmed what it is with a quick C program. The
default Locale for lib C is 'C'. It is up the program to set its locale
to match the environment using;

  setlocale(LC_ALL,"");

The Python locale module documents this, and recommends putting;

import locale
locale.setlocale(locale.LC_ALL, '')

At the top of programs to make them use your locale as specified in your
environment.

Note that locale.resetlocale() is documented as "resets the locale to
the default settings", where the default is determined by
locale.getdefaultlocale(), which uses the environment.

So the "default" is determined from your environment, but "C" is used by
default... nice and confusing :-)

Should Python do setlocale(LC_ALL,"") on startup so that the "default"
locale is used by default?

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] strftime/strptime locale funnies...

2006-04-05 Thread Donovan Baarda

G'day,

Just noticed on Debian (testing), Ubuntu (warty?), and RedHat (old)
based systems Python's time.strptime() seems to ignore the environment's
Locale and just uses "C".

Last time I looked at this, time.strptime() leveraged off the platform's
strptime(), which meant it had all the extra features, bugs and
missingness of the platform's implementation. 

We now seem to be using a Python implementation in _strptime.py. This
implementation does Locale's by feeding a magic date to time.strftime()
and figuring out how it formats it.

This revealed that time.strftime() is not honouring the Locale settings,
which is causing the new Python strptime() to also get it wrong.

$ set | grep "^LC\|LANG"
GDM_LANG=en_AU.UTF-8
LANG=en_AU.UTF-8
LANGUAGE=en_AU.UTF-8
LC_COLLATE=C

$ date -d "1999-02-22" +%x
22/02/99

$ python
...
>>> import time
>>> time.strftime("%x", time.strptime("1999-02-22","%Y-%m-%d"))
'02/22/99'

This is consistent across all three platforms for multiple Python
versions, including 2.1 and 1.5 (where they were available) which BTW
don't use the Python implementation of strptime().

This suggests that all three of these platforms have a broken libc
strftime() implementation... but all three? And why does date work?

Can others reproduce this? Have I done something stupid? Is this a bug,
and in what, libc or Python?

Slightly OT, is it wise to use a Python strptime() on platforms that
have a perfectly good one in libc? The Python reverse-engineering of
libc's strftime() output to figure out locale formatting is clever,
but...

I see there have already been bugs submitted about strftime/strptime
non-symmetry for things like support of extensions. There has also been
a bug against strptime() Locale switching not working because of caching
Locale formatting info from the strftime() analysis, but I can't seem to
get non-C Locale's working at all...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Threading idea -- exposing a global thread lock

2006-03-14 Thread Donovan Baarda

On Tue, 2006-03-14 at 00:36 -0500, Raymond Hettinger wrote:
> [Guido]
> > Oh, no!
> 
> Before shooting this one down, consider a simpler incarnation not involving 
> the 
> GIL.  The idea is to allow an active thread to temporarily suspend switching 
> for 
> a few steps:
[...]
> I disagree that the need is rare.  My own use case is that I sometimes add 
> some 
> debugging print statements that need to execute atomically -- it is a PITA 
> because PRINT_ITEM and PRINT_NEWLINE are two different opcodes and are not 
> guaranteed to pair atomically.  The current RightWay(tm) is for me to create 
> a 
> separate daemon thread for printing and to send lines to it via the queue 
> module 
> (even that is tricky because you don't want the main thread to exit before a 
> print queued item is completed).  I suggest that that is too complex for a 
> simple debugging print statement.  It would be great to simply write:

You don't need to use queue... that has the potentially nasty side
affect of allowing threads to run ahead before their debugging has been
output. A better way is to have all your debugging go through a
print_debug() method that acquires and releases a debug_lock
threading.Lock. This is simpler as it avoids the separate thread, and
ensures that threads "pause" until their debugging output is done.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Threading idea -- exposing a global thread lock

2006-03-14 Thread Donovan Baarda

On Mon, 2006-03-13 at 21:06 -0800, Guido van Rossum wrote:
> Oh, no! Please!
> 
> I just had to dissuade someone inside Google from the same idea.

Heh... that was me... I LOL'ed when I saw this... and no, I didn't put
Raymond up to it :-)

> IMO it's fatally flawed for several reasons: it doesn't translate
> reasonably to Jython or IronPython, it's really tricky to implement,
> and it's an invitation for deadlocks. The danger of this thing in the
> wrong hands is too big to warrant the (rare) use case that can only be
> solved elegantly using direct GIL access.

I didn't bother pursuing it because I'm not that attached to it... I'm
not sure that a language like Python really needs it, and I don't do
that kind of programming much any more.

When I did, I was programming in Ada. The Ada language has a global
thread-lock used as a primitive to implement all other atomic operations
and thread-synchronisation stuff... (it's been a while... this may have
been a particular Ada compiler extension, though I think the Ada
concurrency model pretty much required it). And before that it was in
assembler; an atomic section was done by disabling all interrupts. At
that low-level, atomic sections were the building-block for all the
other higher level synchronisation tools. I believe the original
semaphore relied on an atomic test-and-set operation.

The main place where something like this would be useful in Python is in
writing thread-safe code that uses non-thread safe resources. Examples
are; a chunk of code that redirects then restores sys.stdout, something
that changes then restores TZ using time.settz(), etc.

I think the deadlock risk argument is bogus... any locking has deadlock
risks. The "danger in the wrong hands" I'm also unconvinced about;
non-threadsafe resource use worries me far more than a strong lock. I'd
rather debug a deadlock than a race condition any day. But the hard to
implement for other VMs is a breaker, and suggests there a damn good
reasons those VMs disallow it that I haven't thought of :-)

So I'm +0, probably -0.5...

> --Guido
> 
> On 3/13/06, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> > A user on comp.lang.python has twisted himself into knots writing 
> > multi-threaded
> > code that avoids locks and queues but fails when running code with 
> > non-atomic
> > access to a shared resource. While his specific design is somewhat flawed, 
> > it
> > does suggest that we could offer an easy way to make a block of code atomic
> > without the complexity of other synchronization tools:
> >
> >gil.acquire()
> >try:
> >   #do some transaction that needs to be atomic
> >finally:
> >   gil.release()
> >
> > The idea is to temporarily suspend thread switches (either using the GIL or 
> > a
> > global variable in the eval-loop).  Think of it as "non-cooperative"
> > multi-threading. While this is a somewhat rough approach, it is dramatically
> > simpler than the alternatives (i.e. wrapping locks around every access to a
> > resource or feeding all resource requests to a separate thread via a Queue).
> >
> > While I haven't tried it yet, I think the implementation is likely to be
> > trivial.
> >
> > FWIW, the new with-statement makes the above fragment even more readable:
> >
> > with atomic_transaction():
> > # do a series of steps without interruption
> >
> >
> > Raymond
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: 
> > http://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
> 
> 
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/abo%40minkirri.apana.org.au
-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-03-01 Thread Donovan Baarda

On Tue, 2006-02-28 at 15:23 -0800, Bill Janssen wrote:
> Greg Ewing wrote:
> > Bill Janssen wrote:
> > 
> > > bytes -> base64 -> text
> > > text -> de-base64 -> bytes
> > 
> > It's nice to hear I'm not out of step with
> > the entire world on this. :-)
> 
> Well, I can certainly understand the bytes->base64->bytes side of
> thing too.  The "text" produced is specified as using "a 65-character
> subset of US-ASCII", so that's really bytes.

Huh... just joining here but surely you don't mean a text string that
doesn't use every character available in a particular encoding is
"really bytes"... it's still a text string...

If you base64 encode some bytes, you get a string. If you then want to
access that base64 string as if it was a bunch of bytes, cast it to
bytes.

Be careful not to confuse "(type)cast" with "(type)convert"... 

A "convert" transforms the data from one type/class to another,
modifying it to be a valid equivalent instance of the other type/class;
ie int -> float. 

A "cast" does not modify the data in any way, it just changes its
type/class to be the other type, and assumes that the data is a valid
instance of the other type; eg int32 -> bytes[4]. Minor data munging
under the hood to cleanly switch the type/class is acceptable (ie adding
array length info etc) provided you keep to the spirit of the "cast".

Keep these two concepts separate and you should be right :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] calendar.timegm

2006-02-23 Thread Donovan Baarda

On Tue, 2006-02-21 at 22:47 -0600, [EMAIL PROTECTED] wrote:
> Sergey> Historical question ;)
> 
> Sergey> Anyone can explain why function timegm is placed into module
> Sergey> calendar, not to module time, where it would be near with
> Sergey> similar function mktime?
> 
> Historical accident. ;-)

It seems time contains a simple wrapper around the equivalent C
functions. There is no C equivalent to timegm() (how do they do it?).

The timegm() function is implemented in python using the datetime
module. The name sux BTW.

It would be nice if there was a time.mkgmtime(), but it would need to be
implemented in C.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] threadsafe patch for asynchat

2006-02-09 Thread Donovan Baarda

On Thu, 2006-02-09 at 13:12 +0100, Fredrik Lundh wrote:
> Donovan Baarda wrote:
> 
> >> Here I think you meant that medusa didn't handle computation in separate
> >> threads instead.
> >
> > No, I pretty much meant what I said :-)
> >
> > Medusa didn't have any concept of a deferred, hence the idea of using
> > one to collect the results of a long computation in another thread never
> > occurred to them... remember the highly refactored OO beauty that is
> > twisted was not even a twinkle in anyone's eye yet.
> >
> > In theory it would be just as easy to add twisted style deferToThread to
> > Medusa, and IMHO it is a much better approach. Unfortunately at the time
> > they went the other way and implemented multiple async-loops in separate
> > threads.
> 
> that doesn't mean that everyone using Medusa has done things in the wrong
> way, of course ;-)

Of course... and even Zope2 was not necessarily the "wrong way"... it
was a perfectly valid design decision, given that it was all new ground
at the time. And it works really well... there were many consequences of
that design that probably contributed to the robustness of other Zope
components like ZODB...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] threadsafe patch for asynchat

2006-02-09 Thread Donovan Baarda

On Wed, 2006-02-08 at 15:14 +0100, Valentino Volonghi aka Dialtone
wrote:
> On Wed, Feb 08, 2006 at 01:23:26PM +0000, Donovan Baarda wrote:
> > I believe that Twisted does pretty much this with it's "deferred" stuff.
> > It shoves slow stuff off for processing in a separate thread that
> > re-syncs with the event loop when it's finished.
> 
> Deferreds are only an elaborate way to deal with a bunch of callbacks.
> It's Twisted itself that provides a way to run something in a separate thread
> and then fire a deferred (from the main thread) when the child thread
> finishes (reactor.callInThread() to call stuff in a different thread,
[...]

I know they are more than just a way to run slow stuff in threads, but
once you have them, simple as they are, they present an obvious solution
to all sorts of things, including long computations in a thread.

Note that once zope2 took the approach it did, blocking the async-loop
didn't hurt so bad, so lots of zope add-ons just did it gratuitously. In
many cases the slow event handlers were slow because they are waiting on
IO that could in theory be serviced as yet another event handler in the
async-loop. However, the Zope/Medusa async framework had become so scary
hardly anyone knew how to do this without breaking Zope itself.

> > In the case of Zope/ZEO I'm not entirely sure but I think what happened
> > was medusa (asyncore/asynchat based stuff Zope2 was based on) didn't
> > have this deferred handler support. When they found some of the stuff
> 
> Here I think you meant that medusa didn't handle computation in separate
> threads instead.

No, I pretty much meant what I said :-)

Medusa didn't have any concept of a deferred, hence the idea of using
one to collect the results of a long computation in another thread never
occurred to them... remember the highly refactored OO beauty that is
twisted was not even a twinkle in anyone's eye yet.

In theory it would be just as easy to add twisted style deferToThread to
Medusa, and IMHO it is a much better approach. Unfortunately at the time
they went the other way and implemented multiple async-loops in separate
threads.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] threadsafe patch for asynchat

2006-02-08 Thread Donovan Baarda

On Wed, 2006-02-08 at 02:33 -0500, Steve Holden wrote:
> Martin v. Löwis wrote:
> > Tim Peters wrote:
[...]
> > What is the reason that people want to use threads when they can have
> > poll/select-style message processing? Why does Zope require threads?
> > IOW, why would anybody *want* a "threadsafe patch for asynchat"?
> > 
> In case the processing of events needed to block? If I'm processing web 
> requests in an async* dispatch loop and a request needs the results of a 
> (probably lengthy) database query in order to generate its output, how 
> do I give the dispatcher control again to process the next asynchronous 
> network event?
> 
> The usual answer is "process the request in a thread". That way the 
> dispatcher can spring to life for each event as quickly as needed.

I believe that Twisted does pretty much this with it's "deferred" stuff.
It shoves slow stuff off for processing in a separate thread that
re-syncs with the event loop when it's finished.

In the case of Zope/ZEO I'm not entirely sure but I think what happened
was medusa (asyncore/asynchat based stuff Zope2 was based on) didn't
have this deferred handler support. When they found some of the stuff
Zope was doing took a long time, they came up with an initially simpler
but IMHO uglier solution of running multiple async loops in separate
threads and using a front-end dispatcher to distribute connections to
them. This way it wasn't too bad if an async loop stalled, because the
other loops in other threads could continue to process stuff.

If ZEO is still using this approach I think switching to a twisted style
approach would be a good idea. However, I suspect this would be a very
painful refactor...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] syntactic support for sets

2006-02-06 Thread Donovan Baarda

On Mon, 2006-02-06 at 15:36 +0100, Ronald Oussoren wrote:
>  On Monday, February 06, 2006, at 03:12PM, Donovan Baarda <[EMAIL PROTECTED]> 
> wrote:
> 
> >On Fri, 2006-02-03 at 20:02 +0100, "Martin v. Löwis" wrote:
> >> Donovan Baarda wrote:
> >> > Before set() the standard way to do them was to use dicts with None
> >> > Values... to me the "{1,2,3}" syntax would have been a logical extension
> >> > of the "a set is a dict with no values, only keys" mindset. I don't know
> >> > why it wasn't done this way in the first place, though I missed the
> >> > arguments where it was rejected.
> >> 
> >> There might be many reasons; one obvious reason is that you can't spell
> >> the empty set that way.
> >
> >Hmm... how about "{,}", which is the same trick tuples use for the empty
> >tuple?
>
> Isn't () the empty tuple? I guess you're confusing this with a single element 
> tuple: (1,) instead of (1) (well actually it is "1,")

Yeah, sorry.. nasty brainfart...

> BTW. I don't like your proposal for spelling the empty set as {,} because 
> that is entirely non-obvious. If {1,2,3} where a valid way to spell a set 
> literal, I'd expect {} for the empty set.

yeah... the problem is differentiating the empty set from an empty dict.
The only alternative that occured to me was the not-so-nice and
not-backwards-compatible "{:}" for an empty dict and "{}" for an empty
set.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] syntactic support for sets

2006-02-06 Thread Donovan Baarda

On Fri, 2006-02-03 at 20:02 +0100, "Martin v. Löwis" wrote:
> Donovan Baarda wrote:
> > Before set() the standard way to do them was to use dicts with None
> > Values... to me the "{1,2,3}" syntax would have been a logical extension
> > of the "a set is a dict with no values, only keys" mindset. I don't know
> > why it wasn't done this way in the first place, though I missed the
> > arguments where it was rejected.
> 
> There might be many reasons; one obvious reason is that you can't spell
> the empty set that way.

Hmm... how about "{,}", which is the same trick tuples use for the empty
tuple?

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] syntactic support for sets

2006-02-06 Thread Donovan Baarda

On Fri, 2006-02-03 at 11:56 -0800, Josiah Carlson wrote:
> Donovan Baarda <[EMAIL PROTECTED]> wrote:
[...]
> > Nuff was a fairy... though I guess it depends on where you draw the
> > line; should [1,2,3] be list(1,2,3)?
> 
> Who is "Nuff"?

fairynuff... :-)

> Along the lines of "not every x line function should be a builtin", "not
> every builtin should have syntax".  I think that sets have particular
> uses, but I don't believe those uses are sufficiently varied enough to
> warrant the creation of a syntax.  I suggest that people take a walk
> through their code. How often do you use other sequence and/or mapping
> types? How many lists, tuples and dicts are there?  How many sets? Ok,
> now how many set literals?

The absence of sets in early Python, the requirement to "import sets"
when they first appeared, and the lack of a set syntax now all mean that
people tend to avoid using sets and resort to lists, tuples, and "dicts
of None" instead, even though they really want a set. Anywhere you see
"if value in sequence:", they probably mean sequence is a set, and this
code would run much faster if it really was, and might even avoid
potential bugs because it would prevent duplicates...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] syntactic support for sets

2006-02-03 Thread Donovan Baarda

On Fri, 2006-02-03 at 09:00 -0800, Josiah Carlson wrote:
[...]
> Sets are tacked on.  That's why you need to use 'import sets' to get to
> them, in a similar fashion that you need to use 'import array' to get
> access to C-like arrays.

No you don't;

$ python
Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> v=set((1,2,3))
>>> f=frozenset(v)
>>>

set and frozenset are now builtin.

> I personally object to making syntax for sets for the same reasons I
> object to making arrays, heapqs, Queues, deques, or any of the other
> data structure-defining modules in the standard library into syntax.

Nuff was a fairy... though I guess it depends on where you draw the
line; should [1,2,3] be list(1,2,3)?

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] syntactic support for sets

2006-02-03 Thread Donovan Baarda

On Fri, 2006-02-03 at 12:04 +, Donovan Baarda wrote:
> On Wed, 2006-02-01 at 13:55 -0500, Greg Wilson wrote:
[...]
> Personally I'd like this. currently the "set(...)"  syntax makes sets
> feel tacked on compared to tuples, lists, dicts, and strings which have
> nice built in syntax support. Many people don't realise they are there
> because of this.
[...]
> Frozensets are to sets what tuples are to lists. It would be nice if
> there was another type of bracket that could be used for frozenset...
> something like ':1,2,3:'... yuk... I dunno.

One possible bracket option for frozenset would be "<1,2,3>" which I
initially rejected because of the possible syntactic clash with the <
and > operators... however, there may be a way this could work... dunno.

The other thing that keeps nagging me is set, frozenset, tuple, and list
all overlap in functionality to fairly significant degrees. Sometimes it
feels like just implementation or application differences... could a
list that is never modified be optimised under the hood as a tuple?
Could the immutability constraint of tuples be just acquired by a list
when it is used as a key? Could a set simply be a list with unique
values? etc.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] syntactic support for sets

2006-02-03 Thread Donovan Baarda

On Wed, 2006-02-01 at 13:55 -0500, Greg Wilson wrote:
> Hi,
> 
> I have a student who may be interested in adding syntactic support for
> sets to Python, so that:
> 
> x = {1, 2, 3, 4, 5}
> 
> and:
> 
> y = {z for z in x if (z % 2)}

Personally I'd like this. currently the "set(...)"  syntax makes sets
feel tacked on compared to tuples, lists, dicts, and strings which have
nice built in syntax support. Many people don't realise they are there
because of this.

Before set() the standard way to do them was to use dicts with None
Values... to me the "{1,2,3}" syntax would have been a logical extension
of the "a set is a dict with no values, only keys" mindset. I don't know
why it wasn't done this way in the first place, though I missed the
arguments where it was rejected.

As for frozenset vs set, I would be inclined to make them normal mutable
sets. This is in line with the "dict without values" idea.

Frozensets are to sets what tuples are to lists. It would be nice if
there was another type of bracket that could be used for frozenset...
something like ':1,2,3:'... yuk... I dunno.

Alternatively you could to the same thing we do with strings; add a
prefix char for different variants; {1,2,3} is a set, f{1,2,3} is a
frozen set...

For Python 3000 you could extend this approach to lists and dicts;
[1,2,3] is a list, f[1,2,3] is a "frozen list" or tuple, {1:'a',2:'b'}
is a dict, f{1:'a',2:'b'} is a "frozen dict" which can be used as a key
in other dicts... etc.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Octal literals

2006-02-03 Thread Donovan Baarda

On Wed, 2006-02-01 at 19:09 +, M J Fleming wrote:
> On Wed, Feb 01, 2006 at 01:35:14PM -0500, Barry Warsaw wrote:
> > The proposal for something like 0xff, 0o664, and 0b1001001 seems like
> > the right direction, although 'o' for octal literal looks kind of funky.
> > Maybe 'c' for oCtal?  (remember it's 'x' for heXadecimal).
> >
> > -Barry
> >
> 
> +1

+1 too. 

It seems like a "least changes" way to fix the IMHO strange 0123 != 123
behaviour. 

Any sort of arbitrary base syntax is overkill; decimal, hexadecimal,
octal, and binary cover 99.9% of cases. The 0.1% of other cases are very
special, and can use int("LITERAL",base=RADIX).

For me, binary is far more useful than octal, so I'd be happy to let
octal languish as legacy support, but I definitely want "0b10110101".

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New Pythondoc by effbot

2006-01-24 Thread Donovan Baarda

On Sat, 2006-01-21 at 19:15 -0500, Terry Reedy wrote:
> >> http://effbot.org/lib/os.path.join
> 
> On this page, 8 of 30 entries have a 'new in' comment.  For anyone with no 
> interest in the past, these constitute noise.  I wonder if for 3.0, the 

Even the past is relative... I find the "new in" doco absolutely
essential, because my "present" depends on what system I'm on, and some
of them are stuck in a serious time-warp. I do not have a time-machine
big enough to transport whole companies.

> timer can be reset and the docs start clean again.  To keep them backwards 
> compatible, they would also have to be littered with 'changed in 3.0' and 
> 'deleted in 3.0' entries.  Better, I think, to refer people to the last 2.x 
> docs and a separate 2.x/3.0 changes doc.

I also find "changed in" essential, but I don't mind not having "deleted
in"... it encourages developers stuck in those time-warps to avoid
features that get deleted in the future :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str with base

2006-01-18 Thread Donovan Baarda

On Tue, 2006-01-17 at 20:25 -0800, Guido van Rossum wrote:
> On 1/17/06, Bob Ippolito <[EMAIL PROTECTED]> wrote:
> > There shouldn't be a %B for the same reason there isn't an %O or %D
> > -- they're all just digits, so there's not a need for an uppercase
[...]

so %b is "binary",

+1

> > The difference between hex() and oct() and the proposed binary() is
> 
> I'd propose bin() to stay in line with the short abbreviated names.
[...]

+1

> The binary type should have a 0b prefix.
[...]

+1

For those who argue "who would ever use it?", I would :-) 

Note that this does not support and is independent of supporting
arbitrary bases. I don't think we need to support arbitrary bases, but
if we did I would vote for ".precision" to mean ".base" for "%d"... ie;

"%3.3d" % 5 == " 12"

I think supporting arbitrary bases for floats is way overkill and not
worth considering.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str with base

2006-01-18 Thread Donovan Baarda

On Tue, 2006-01-17 at 16:38 -0700, Adam Olsen wrote:
> On 1/17/06, Thomas Wouters <[EMAIL PROTECTED]> wrote:
> > On Tue, Jan 17, 2006 at 09:23:29AM -0500, Jason Orendorff wrote:
[...]
> I dream of a day when str(3.25, base=2) == '11.01'.  That is the
> number a float really represents.  It would be so much easier to
> understand why floats behave the way they do if it were possible to
> print them in binary.
[...]

Heh... that's pretty much why I used base16 float notation when doing
fixed point stuff in assembler... uses less digits than binary, but
easily visualised as bits.

However, I do think that we could go overboard here... I don't know that
we really need arbitrary base string formatting for all numeric types. I
think this is a case of "very little gained for too much added
complexity".

If we really do, and someone is prepared to implement it, then I think
adding "@base" is the best way to do it (see my half joking post
earlier). 

If we only want arbitrary bases for integer types, the best way would be
to leverage off the existing ".precision" so that it means ".base" for
"%d".

> > In-favour-of-%2b-ly y'rs,
> 
> My only opposition to this is that the byte type may want to use it. 
> I'd rather wait until byte is fully defined, implemented, and released
> in a python version before that option is taken away.

There's always "B" for bytes and "b" for bits... though I can't imagine
why byte would need it's own conversion type.

I'm not entirely sure everyone is on the same page for "%b" here... it
would only be a shorthand for "binary" in the same way that "%x" is for
"hexidecimal". It would not support arbitrary bases, and thus "%2b"
would mean a binary string with minimum length of 2 characters.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str with base

2006-01-17 Thread Donovan Baarda

On Tue, 2006-01-17 at 10:05 +, Nick Craig-Wood wrote:
> On Mon, Jan 16, 2006 at 11:13:27PM -0500, Raymond Hettinger wrote:
[...]
> Another suggestion would be to give hex() and oct() another parameter,
> base, so you'd do hex(123123123, 2). Perhaps a little
> counter-intuitive, but if you were looking for base conversion
> functions you'd find hex() pretty quickly and the documentation would
> mention the other parameter.

Ugh!

I still favour extending % format strings. I really like '%b' for
binary, but if arbitary bases are really wanted, then perhaps also
leverage off the "precision" value for %d to indicate base such that '%
3.3d' % 5 = " 12"

If people think that using "." is for "precision" and is too ambiguous
for "base", you could do something like extend the whole conversion
specifier to (in EBNF)

[EMAIL PROTECTED]

which would allow for weird things like "[EMAIL PROTECTED]" % 5.5 == " 12."

Note: it is possible for floats to be represented in non-decimal number
systems, its just extremely rare for anyone to do it. I have in my
distant past used base 16 float notation for fixed-point numbers.

I personally think %b would be adding enough. The other suggestions are
just me being silly :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str with base

2006-01-17 Thread Donovan Baarda

On Tue, 2006-01-17 at 01:03 -0500, Barry Warsaw wrote:
> On Mon, 2006-01-16 at 20:49 -0800, Bob Ippolito wrote:
> 
> > The only bases I've ever really had a good use for are 2, 8, 10, and  
> > 16.  There are currently formatting codes for 8 (o), 10 (d, u), and  
> > 16 (x, X).  Why not just add a string format code for unsigned  
> > binary?  The obvious choice is probably "b".
> > 
> > For example:
> > 
> >  >>> '%08b' % (12)
> > '1100'
> >  >>> '%b' % (12)
> > '1100'
> 
> +1

+1 me too.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] commit of r41880 - python/trunk/Python/Python-ast.c

2006-01-03 Thread Donovan Baarda

On Mon, 2006-01-02 at 15:16 -0800, Neal Norwitz wrote:
> On 1/2/06, Barry Warsaw <[EMAIL PROTECTED]> wrote:
> > I think we have a fundamental problem with Python-ast.c and
> > Python-ast.h.  These files should not be both auto-generated and checked
> > into Subversion.
> 
> I agree with the problem statement.
> 
> > The general rule should be that no file that is ever generated can be
> > checked into Subversion.  Probably the right approach is to check in a
> > template file that will not get removed by a distclean, and modify the
> > build process to generate Python-ast.* from those template files.
> 
> I'm not sure about your proposed solution, though.
> 
> There's a bootstrapping issue.  Python-ast.[ch] are generated by a
> python 2.2+ script.  /f created a bug report if only 2.1 is available.
> 
> The Python-ast.[ch] should probably not be removed by distclean.  This
> is similar to configure.  Would that make you happy?  What else would
> improve the current situation?
> 
> If you go the template route, you would just copy the files. That
> doesn't seem to gain anything.

The solution I use is to never have anything auto-generated in CVS/SVN,
but have "make dist" generate and include anything needed for
bootstrapping in the distribution tarball (or whatever). Doing "make
distclean" should delete enough to bring you back to a freshly extracted
distribution tarball, and "make maintainer-clean" should delete all
auto-generated files to bring you back to a clean CVS/SVN checkout.

I tend to include quite a few generated files in the distribution
tarball that are not in CVS/RCS. Things like ChangeList (generated by
cvs2cl), all the autotools autogen'ed files, generated datafiles, etc.

This way your source distributions don't have any bootstrap problems,
but you also don't have any auto-generated files in CVS/SVN and the
problems they create. It does however mean that a fresh CVS/SVN checkout
does have additional build requirements above and beyond building from a
source distribution.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Doc-SIG] that library reference, again

2005-12-30 Thread Donovan Baarda

; workflow rather
than the "submit bug" workflow, and maybe that will make things easier
for the big picture "update and release docs" workflow. But the speed of
the tool-chain has little to do with this, only the "documentation
language" popularity among the developers and users.

...and if the LaTeX guys don't mind fixing bugs instead of applying
patches and are handling the load... the status quo is fine by me, I'm
happy not to do documentation :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] When do sets shrink?

2005-12-29 Thread Donovan Baarda

On Thu, 2005-12-29 at 17:17 +0100, Fredrik Lundh wrote:
> Noam Raphael wrote:
> 
> > I'm not saying that practically it must be used - I'm just saying that
> > it can't be called a heuristic, and that it doesn't involve any "fancy
> > overkill size hinting or history tracking". It actually means
> > something like this:
> > 1. If you want to insert and the table is full, resize the table to
> > twice the current size.
> > 2. If you delete and the number of elements turns out to be less than
> > a quarter of the size of the table, resize the table to half of the
> > current size.
> 
> sure sounds like a heuristic algorithm to me... (as in "not guaranteed to
> be optimal under all circumstances, even if it's probably quite good in all
> practical cases")
> 
> 

My problem with this heuristic is it doesn't work well for the
probably-fairly-common use-case of; fill it, empty it, fill it, empty
it, fill it...etc.

As Guido pointed out, if you do have a use-case where a container gets
very big, then shrinks and doesn't grow again, you can manually cleanup
by creating a new container and del'ing the old one. If the
implementation is changed to use this heuristic, there is no simple way
to avoid the re-allocs for this use-case... (don't empty, but fill with
None... ugh!).

My gut feeling is this heuristic will cause more pain than it would
gain...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] file() vs open(), round 7

2005-12-29 Thread Donovan Baarda

On Sun, 2005-12-25 at 20:38 -0800, Aahz wrote:
> Guido sez in 
> http://mail.python.org/pipermail/python-dev/2004-July/045921.html
> that it is not correct to recommend using ``file()`` instead of
> ``open()``.  However, because ``open()`` currently *is* an alias to
> ``file()``, we end up with the following problem (verified in current
> HEAD) where doing ``help(open)`` brings up the docs for ``file()``:
[...]
> This is confusing.  I suggest that we make ``open()`` a factory function
> right now.  (I'll submit a bug report (and possibly a patch) after I get
> agreement.)

Not totally related but...

way back in 2001-2002, I did some work on writing a Virtual File System
interface for Python. See;

http://minkirri.apana.org.au/~abo/projects/osVFS

The idea was that you could import a module "vfs" as "os", and then any
file operations would go through the virtual file system. I had modules
for things "fakeroot", "mountable", "ftpfs" etc. The vfs module had full
os functionality so it was a "drop in replacement".

The one wart was open(), because it is the only filesystem operation
that wasn't in the os module. At the time I worked around this by adding
a vfs.file() method, and suggesting that people alias open() to
vfs.file(). Note that os.open() already exists as a low-level file open
function, and hence could not be used as a file-object-factory method.

I'm wondering if it wouldn't be a good idea to centralise all filesystem
operations into the os module, including open() or file(). Perhaps the
builtin open() and file() could call os.file()... or P3K could remove
the builtins... I dunno... it just felt ugly at the time that open() was
the one oddity.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] When do sets shrink?

2005-12-29 Thread Donovan Baarda

On Wed, 2005-12-28 at 18:57 -0500, Raymond Hettinger wrote:
[...]
> What could be done is to add a test for excess dummy entries and trigger
> a periodic resize operation.  That would make the memory available for
> other parts of the currently running script and possibly available for
> the O/S.
> 
> The downside is slowing down a fine-grained operation like pop().  For
> dictionaries, this wasn't considered worth it.  For sets, I made the
> same design decision.  It wasn't an accident.  I don't plan on changing
> that decision unless we find a body of real world code that would be
> better-off with more frequent re-sizing.

I don't think it will ever be worth it.

Re-allocations that grow are expensive, as they often need to move the
entire contents from the old small allocation to the new larger
allocation. Re-allocations that shrink can also be expensive, or at the
least increase heap fragmentation. So you want to avoid re-allocations
whenever possible.

The ideal size for any container is "as big as it needs to be". The best
heuristic for this is probably "as big as it's ever been, or if it just
got bigger than that, assume it's half way through growing". which is
what Python currently does.

Without some sort of fancy overkill size hinting or history tracking,
that's probably as good a heuristic as you can get.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] (no subject)

2005-11-24 Thread Donovan Baarda

On Thu, 2005-11-24 at 14:11 +, Duncan Grisby wrote:
> Hi,
> 
> I posted this to comp.lang.python, but got no response, so I thought I
> would consult the wise people here...
> 
> I have encountered a problem with the re module. I have a
> multi-threaded program that does lots of regular expression searching,
> with some relatively complex regular expressions. Occasionally, events
> can conspire to mean that the re search takes minutes. That's bad
> enough in and of itself, but the real problem is that the re engine
> does not release the interpreter lock while it is running. All the
> other threads are therefore blocked for the entire time it takes to do
> the regular expression search.

I don't know if this will help, but in my experience compiling re's
often takes longer than matching them... are you sure that it's the
match and not a compile that is taking a long time? Are you using
pre-compiled re's or are you dynamically generating strings and using
them?

> Is there any fundamental reason why the re module cannot release the
> interpreter lock, for at least some of the time it is running?  The
> ideal situation for me would be if it could do most of its work with
> the lock released, since the software is running on a multi processor
> machine that could productively do other work while the re is being
> processed. Failing that, could it at least periodically release the
> lock to give other threads a chance to run?
> 
> A quick look at the code in _sre.c suggests that for most of the time,
> no Python objects are being manipulated, so the interpreter lock could
> be released. Has anyone tried to do that?

probably not... not many people would have several-minutes-to-match
re's.

I suspect it would be do-able... I suggest you put together a patch and
submit it on SF...


-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] urlparse brokenness

2005-11-24 Thread Donovan Baarda

On Tue, 2005-11-22 at 23:04 -0600, Paul Jimenez wrote:
> It is my assertion that urlparse is currently broken.  Specifically, I 
> think that urlparse breaks an abstraction boundary with ill effect.
> 
> In writing a mailclient, I wished to allow my users to specify their
> imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which
> worked fine. I then thought that the natural extension to support

FWIW, I have a small addition related to this that I think would be
handy to add to the urlparse module. It is a pair of functions
"netlocparse()" and "netlocunparse()" that is for parsing and unparsing
"user:[EMAIL PROTECTED]:port" netloc's.

Feel free to use/add/ignore it...

http://minkirri.apana.org.au/~abo/projects/osVFS/netlocparse.py

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Coroutines (PEP 342)

2005-11-15 Thread Donovan Baarda

On Mon, 2005-11-14 at 15:46 -0700, Bruce Eckel wrote:
[...]
> What is not clear to me, and is not discussed in the PEP, is whether
> coroutines can be distributed among multiple processors. If that is or
> isn't possible I think it should be explained in the PEP, and I'd be
> interested in know about it here (and ideally why it would or wouldn't
> work).

Even if different coroutines could be run on different processors, there
would be nothing gained except extra overheads of interprocessor memory
duplication and communication delays.

The whole process communication via yield and send effectively means
only one co-routine is running at a time, and all the others are blocked
waiting for a yield or send.

This was the whole point; it is a convenient abstraction that appears to
do work in parallel, while actually doing it sequentially, avoiding the
overheads and possible race conditions of threads.

It has the problem that a single co-routine can monopolise execution,
hence the other name "co-operative multi-tasking", where co-operation is
the requirement for it to work.

At least... that's the way I understood it... I could be totally
mistaken...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter

2005-11-10 Thread Donovan Baarda

On Thu, 2005-11-10 at 00:40 -0500, Michiel Jan Laurens de Hoon wrote:
> Stephen J. Turnbull wrote:
> 
> >Michiel> What is the advantage of Tk in comparison to other GUI
> >Michiel> toolkits?
[...]
> My application doesn't need a toolkit at all. My problem is that because 
> of Tkinter being the standard Python toolkit, we cannot have a decent 
> event loop in Python. So this is the disadvantage I see in Tkinter.
[...]

I'm kind of surprised no-one has mentioned Twisted in this thread.

Twisted is an async-framework that I believe has support for using a
variety of different event-loops, including Tkinter and wxWidgets, as
well as it's own.

It has been heavily re-factored many times, so if you want to see the
current Python "state of the art" way of doing this, I'd be having a
look at what they are doing.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pythonic concurrency

2005-10-10 Thread Donovan Baarda

On Mon, 2005-10-10 at 18:59, Bill Janssen wrote:
> > The problem with threads is at first glance they appear easy...
> 
> Anyone who thinks that a "glance" is enough to understand something is
> too far gone to worry about.  On the other hand, you might be
> referring to a putative brokenness of the Python documentation on
> Python threads.  I'm not sure they're broken, though.  They just point
> out the threading that Python provides, for folks who want to use
> threads.  Are they required to provide a full course in threads?

I was speaking in general, not about Python in particular. If anything,
Python is one of the simplest and safest platforms for threading (thanks
mostly to the GIL). And I find the documentation excellent :-)

> > ...which seduces many beginning programmers into using them.
> 
> Don't worry about this.  That's how "beginning programmers" learn.

Many other things "beginning programmers" learn very quickly break if
you do it wrong, until you learn to do it right. Threads are tricky in
that they can "mostly work", and it can be a long while before you
realise it is actually broken.

I don't know how many bits of other people's code I've had to fix that
worked for years until it was run on hardware fast enough to trigger
that nasty race condition :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pythonic concurrency

2005-10-10 Thread Donovan Baarda

On Fri, 2005-10-07 at 17:47, Bruce Eckel wrote:
> Early in this thread there was a comment to the effect that "if you
> don't know how to use threads, don't use them," which I pointedly
> avoided responding to because it seemed to me to simply be
> inflammatory. But Ian Bicking just posted a weblog entry:
> http://blog.ianbicking.org/concurrency-and-processes.html where he
> says "threads aren't as hard as they imply" and "An especially poor
> argument is one that tells me that I'm currently being beaten with a
> stick, but apparently don't know it."

The problem with threads is at first glance they appear easy, which
seduces many beginning programmers into using them. The hard part is
knowing when and how to lock shared resources... at first glance you
don't even realise you need to do this. So many threaded applications
are broken and don't know it, because this kind of broken-ness is nearly
always intermittant and very hard to reproduce and debug.

One common alternative is async polling frameworks like Twisted. These
scare beginners away because a first glance, they appear hideously
complicated. However, if you take the time to get your head around them,
you get a better feel for all the nasty implications of concurrency, and
end up designing better applications.

This is the reason why, given a choice between an async and a threaded
implementation of an application, I will always choose the async
solution. Not because async is inherently better than threading, but
because the programmer who bothered to grock async is more likely to get
it right.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pythonic concurrency

2005-10-10 Thread Donovan Baarda

On Fri, 2005-10-07 at 23:54, Nick Coghlan wrote:
[...]
> The few times I have encountered anyone saying anything resembling "threading 
> is easy", it was because the full sentence went something like "threading is 
> easy if you use message passing and copy-on-send or release-reference-on-send 
> to communicate between threads, and limit the shared data structures to those 
> required to support the messaging infrastructure". And most of the time there 
> was an implied "compared to using semaphores and locks directly, " at the 
> start.

LOL! So threading is easy if you restrict inter-thread communication to
message passing... and what makes multi-processing hard is your only
inter-process communication mechanism is message passing :-)

Sounds like yet another reason to avoid threading and use processes
instead... effort spent on threading based message passing
implementations could instead be spent on inter-process messaging.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL, Python 3, and MP vs. UP

2005-09-21 Thread Donovan Baarda

On Tue, 2005-09-20 at 22:43, Guido van Rossum wrote:
> On 9/20/05, John J Lee <[EMAIL PROTECTED]> wrote:
[...]
> I don't know that any chips are designed with threading in mind. Fast
> threading benefits from fast context switches which benefits from
> small register sets. I believe the trend is towards ever large
> register sets. Also, multiple processors with shared memory don't
> scall all that well; multiple processors with explicit IPC channels
> scale much better. All arguments for multi-processing and against
> multi-threading.

Exactly! 

I believe the latest MP opteron chipsets use hypertransport busses to
directly access the other processor's memory and possibly CPU cache. In
theory this means shared memory will not hurt too badly, helping
threading. However, memory contention bottlenecks and cache coherency
will always mean shared memory hurts more, and will never scale better,
than IPC.

The reality is threads were invented as a low overhead way of easily
implementing concurrent applications... ON A SINGLE PROCESSOR. Taking
into account threading's limitations and objectives, Python's GIL is the
best way to support threads. When hardware (seriously) moves to multiple
processors, other concurrency models will start to shine. 

In the short term there will be various hacks to try and make the
existing plethora of threading applications run better on multiple
processors, but ultimately the overheads of shared memory will force
serious multi-processing to use IPC channels. If you want serious MP,
use processes, not threads.

I see anti-GIL threads again and again. Get over it... the GIL rocks for
threads :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] reference counting in Py3K

2005-09-07 Thread Donovan Baarda

On Wed, Sep 07, 2005 at 02:01:01AM -0400, Phillip J. Eby wrote:
[...]
> Just an FYI; Pyrex certainly makes it relatively painless to write code 
> that interfaces with C, but it doesn't do much for performance, and 
> naively-written Pyrex code can actually be slower than carefully-optimized 
> Python code.  So, for existing modules that were written in C for 
> performance reasons, Pyrex isn't currently a substitute.

I just want to second this; my experiments with pyrex on pysync
produced no speedups. I got a much more noticable speed benefit from
psyco. This was admittedly a long time ago...

-- 
--------
Donovan Baardahttp://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Remove str.find in 3.0?

2005-08-27 Thread Donovan Baarda

On Sat, 2005-08-27 at 10:16 -0700, Josiah Carlson wrote:
> Guido van Rossum <[EMAIL PROTECTED]> wrote:
[...]
> Oh, there's a good thing to bring up; regular expressions!  re.search
> returns a match object on success, None on failure.  With this "failure
> -> Exception" idea, shouldn't they raise exceptions instead?  And
> goodness, defining a good regular expression can be quite hard, possibly
> leading to not insignificant "my regular expression doesn't do what I
> want it to do" bugs.  Just look at all of those escape sequences and the
> syntax! It's enough to make a new user of Python gasp.

I think re.match() returning None is an example of 1b (as categorised by
Terry Reedy). In this particular case a 1b style response is OK. Why;

1) any successful match evaluates to "True", and None evaluates to
"False". This allows simple code like;

  if myreg.match(s):
do something.

Note you can't do this for find, as 0 is a successful "find" and
evaluates to False, whereas other results including -1 evaluate to True.
Even worse, -1 is a valid index.

2) exceptions are for unexpected events, where unexpected means "much
less likely than other possibilities". The re.match() operation asks
"does this match this", which implies you have an about even chance of
not matching... ie a failure to match is not unexpected. The result None
makes sense... "what match did we get? None, OK".

For str.index() you are asking "give me the index of this inside this",
which implies you expect it to be in there... ie not finding it _is_
unexpected, and should raise an exception.

Note that re.match() returning None will raise exceptions if the rest of
your code doesn't expect it;

index = myreg.match(s).start()
tail = s[index:]

This will raise an exception if there was no match.

Unlike str.find();

index = s.find(r)
tail = s[index:]

Which will happily return the last character if there was no match. This
is why find() should return None instead of -1.

> With the existance of literally thousands of uses of .find and .rfind in
> the wild, any removal consideration should be weighed heavily - which
> honestly doesn't seem to be the case here with the ~15 minute reply time
> yesterday (just my observation and opinion).  If you had been ruminating
> over this previously, great, but that did not seem clear to me in your
> original reply to Terry Reedy.

bare in mind they are talking about Python 3.0... I think :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)

2005-08-24 Thread Donovan Baarda

On Wed, 2005-08-24 at 07:33, "Martin v. Löwis" wrote:
> Walter Dörwald wrote:
> > Martin v. Löwis wrote:
> > 
> >> Walter Dörwald wrote:
[...]
> Actually, on a second thought - it would not remove the quadratic
> aspect. You would still copy the rest string completely on each
> split. So on the first split, you copy N lines (one result line,
> and N-1 lines into the rest string), on the second split, N-2
> lines, and so on, totalling N*N/2 line copies again. The only
> thing you save is the join (as the rest is already joined), and
> the IsLineBreak calls (which are necessary only for the first
> line).
[...]

In the past, I've avoided the string copy overhead inherent in split()
by using buffers...

I've always wondered why Python didn't use buffer type tricks internally
for split-type operations. I haven't looked at Python's string
implementation, but the fact that strings are immutable surely means
that you can safely and efficiently reference an implementation level
"data" object for all strings... ie all strings are "buffers".

The only problem I can see with this is huge "data" objects might hang
around just because some small fragment of it is still referenced by a
string. Surely a simple huristic or two like "if len(string) <
len(data)/8: copy data; else: reference data" would go a long way
towards avoiding that.

In my limited playing around with manipulating of strings and
benchmarking stuff, the biggest overhead is nearly always the copys.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fwd: Distributed RCS

2005-08-15 Thread Donovan Baarda

On Mon, 2005-08-15 at 04:30, Benji York wrote:
> Martin v. Löwis wrote:
> > [EMAIL PROTECTED] wrote:
> >>Granted.  What is the cost of waiting a bit longer to see if it (or
> >>something else) gets more usable and would hit the mark better than svn?
> > 
> > It depends on what "a bit" is. Waiting a month would be fine; waiting
> > two years might be pointless.
> 
> This might be too convoluted to consider, but I thought I might throw it 
> out there.  We use svn for our repositories, but I've taken to also 
> using bzr so I can do local commits and reversions (within a particular 
> svn reversion).  I can imagine expanding that usage to sharing branches 
> and such via bzr (or mercurial, which looks great), but keeping the 
> trunk in svn.

Not too convoluted at all; I already do exactly this with many upstream
CVS and SVN repositorys, using a local PRCS for my own branches. I'm
considering switching to a distributed RCS for my own branches because
it would make it easier for others to share them.

I think this probably is the best solution; it gives a reliable(?)
centralised RCS for the trunk, but allows distributed development.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-08 Thread Donovan Baarda

On Mon, 2005-08-08 at 17:51, Trent Mick wrote:
[...]
> [Donovan Baarda wrote]
> > On Mon, 2005-08-08 at 15:49, Trent Mick wrote:
[...]
> You want to do checkins of code in a consisten state. Some large changes
> take a couple of days to write. During which one may have to do a couple
> minor things in unrelated sections of a project. Having some mechanism
> to capture some thoughts and be able to say "these are the relevant

I don't need to checkin in a consitent state if I'm working on a
seperate branch. I can checkin any time I want to record a development
checkpoint... I can capture the thoughts in the version history of the
branch.

> source files for this work" is handy. Creating a branch for something
> that takes a couple of days is overkill.
[...]
> The alternative being either that you have separate branches for
> everything (can be a pain) or just check-in for review (possibly

It all comes down to how painless branch/merge is. Many esoteric
"features" of version control systems feel like they are there to
workaround the absence of proper branch/merge histories.

Note: SVN doesn't have branch/merge histories either.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-08 Thread Donovan Baarda

On Mon, 2005-08-08 at 15:49, Trent Mick wrote:
> One feature I like in Perforce (which Subversion doesn't have) is the
> ability to have pending changesets. A changeset is, as with subversion,
> something you check-in atomically. Pending changesets in Perforce allow
> you to (1) group related files in a source tree where you might be
> working on multiple things at once to ensure and (2) to build a change
> description as you go. In a large source tree this can be useful for
> separating chunks of work.

This seems like a poor workaround for crappy branch/merge support. 

I'm new to perforce, but the pending changesets seem dodgey to me... you
are accumulating changes gradually without recording any history during
the process... ie, no checkins until the end.

Even worse, perforce seems to treat clients like "unversioned branches",
allowing you to review and test pending changesets in other clients.
This supposedly allows people to review/test each others changes before
they are committed. The problem is, since these changes are not
committed, there is no firm history of what what was reviewed/tested vs
what gets committed... ie they could be different.

Having multiple different pending changesets in one large source tree
also feels like a workaround for high client overheads. Trying to
develop and test a mixture of different changes in one source tree is
asking for trouble... they can interact.

Maybe I just haven't grokked perforce yet... which might be considered a
black mark against it's learning curve :-)

For me, the logical way to group a collection of changes is in a branch.
This allows you to commit and track history of the collection of
changes. You check out each branch into different directories and
develop/test them independantly. The branch can then be reviewed and
merged when it is complete.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-07 Thread Donovan Baarda

Martin v. Löwis wrote:
> Donovan Baarda wrote:
> 
>>Yeah. IMHO the sadest thing about SVN is it doesn't do branch/merge
>>properly. All the other cool stuff like renames etc is kinda undone by
>>that. For a definition of properly, see;
>>
>>http://prcs.sourceforge.net/merge.html
> 
> 
> Can you please elaborate? I read the page, and it seems to me that
> subversion's merge command works exactly the way described on the
> page.

maybe it's changed since I last looked at it, but last time I looked SVN 
didn't track merge histories. From the svnbook;

"Unfortunately, Subversion is not such a system. Like CVS, Subversion 
1.0 does not yet record any information about merge operations. When you 
commit local modifications, the repository has no idea whether those 
changes came from running svn merge, or from just hand-editing the files."

What this means is SVN has no way of automatically identifying the 
common version. An svn merge requires you to manually identify and 
specify the "last common point" where the branch was created or last 
merged. PRCS automatically finds the common version from the 
branch/merge history, and even remembers the 
merge/replace/nothing/delete decision you make for each file as the 
default to use for future merges.

You can see this in the command line differences. For subversion;

# create and checkout branch my-calc-branch
$ svn copy http://svn.example.com/repos/calc/trunk \
http://svn.example.com/repos/calc/branches/my-calc-branch \
   -m "Creating a private branch of /calc/trunk."
$ svn checkout http://svn.example.com/repos/calc/branches/my-calc-branch

# merge and commit changes from trunk
$ svn merge -r 341:HEAD http://svn.example.com/repos/calc/trunk
$ svn commit -m "Merged trunc changes to my-calc-branch."

# merge and commit more changes from trunk
$ svn merge -r 345:HEAD http://svn.example.com/repos/calc/trunk
$ svn commit -m "Merged trunc changes to my-calc-branch."

Note that 341 and 345 are "magic" version numbers which correspond to 
the trunc version at the time of branch and first merge respectively. It 
is up to the user to figure out these versions using either meticulous 
use of tags or svn logs.

In PRCS;

# create and checkout branch my-calc-branch
$ prcs checkout calc -r 0
$ prcs checkin -r my-calc-branch -m "Creating my-calc-branch"

# merge and commit changes from trunk
$ prcs merge -r 0
$ prcs checkin -m " merged changes from trunk"

# merge and commit more changes from trunk
$ prcs merge -r 0
$ prcs checkin -m " merged changes from trunk"

Note that "-R 0" means "HEAD of trunk branch", and "-r my-calc-branch" 
means "HEAD of my-calc-branch". There is no need to figure out what 
versions of those branches to use as the "changes from" point, because 
PRCS figures it out for you. Not only that, but if you chose to ignore 
changes in certain files during the first merge, the second merge will 
remember that as the default action for the second merge.

--
Donovan Baarda
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Syscall Proxying in Python

2005-08-02 Thread Donovan Baarda

On Tue, 2005-08-02 at 11:59, Gabriel Becedillas wrote:
> Donovan Baarda wrote:
[...]
> > Wow... you guys sure did it the hard way. If you had done it at the
> > Python level, you would have had a much easier time of both implementing
> > and updating it.
[...]
> Hi, thanks for your reply.
> The problem I see with the aproach you're sugesting is that I have to 
> rewrite a lot of code to make it work the way I want. We allready have 
> the syscall proxying stuff with an stdio layer on top of it. I should 
> have to rewrite some parts of some modules and use my own versions of 
> stdio functions, and that is pretty much the same as we have done before.
> There are also native objects that use stdio functions, and I should 
> replace those ones too, or modules that have some native code that uses 
> stdio, or sockets. I should duplicate those files, and make the same 
> kind of search/replace work that we have done previously and that we'd 
> like to avoid.
> Please let me know if I misunderstood you.

Nope... you got it all figured out. I guess it depends on what degree of
"proxying" you want... I thought there was some stuff you wanted
re-directed, and some you didn't. The point is, you _can_ do this at the
Python level, and you only have to modify Python code, not C Python
source. 

However, if you want to proxy everything, then the glib wrapper is
probably the best approach, provided you really want to code in C and
have your own Python binary.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-02 Thread Donovan Baarda

On Tue, 2005-08-02 at 09:06, François Pinard wrote:
> [Raymond Hettinger]
> 
> > >http://www.venge.net/monotone/
> 
> > The current release is 0.21 which suggests that it is not ready for
> > primetime.
> 
> It suggests it, yes, and to me as well.  On the other hand, there is
> a common prejudice that something requires many releases, or frequent
> releases, to be qualified as good.  While it might be true on average,
> this is not necessarily true: some packages need not so many steps for
> becoming very usable, mature or stable.  (Note that I'm not asserting
> anything about Monotone, here.)  We should merely keep an open mind.

It is true that some well designed/developed software becomes reliable
very quicky. However, it still takes heavy use over time to prove that.
You don't want to be the guy who finds out that this is not one of those
bits of software.

IMHO you need maturity for revision control software... you are relying
on it for history. The only open source options worth considering for
Python are CVS and SVN, and even SVN is questionable (see bdb backend
issues).

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Syscall Proxying in Python

2005-08-01 Thread Donovan Baarda

On Mon, 2005-08-01 at 10:36, Gabriel Becedillas wrote:
> Hi,
> We embbeded Python 2.0.1 in our product a few years ago and we'd like to
> upgrade to Python 2.4.1. This was not a simple task, because we needed 
> to execute syscalls on a remote host. We modified Python's source code 
> in severall places to call our own versions of some functions. For 
> example, instead of calling fopen(...), the source code was modified to 
> call remote_fopen(...), and the same was done with other libc functions. 
> Socket functions where hooked too (we modified socket.c), Windows 
> Registry functions, etc..

Wow... you guys sure did it the hard way. If you had done it at the
Python level, you would have had a much easier time of both implementing
and updating it.

As an example, have a look at my osVFS stuff. This is a replacement for
the os module and open() that tricks Python into using a virtual file
system;

http://minkirri.apana.org.au/~abo/projects/osVFS


-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-01 Thread Donovan Baarda

On Sun, 2005-07-31 at 23:54, Stephen J. Turnbull wrote:
> >>>>> "BAW" == Barry Warsaw <[EMAIL PROTECTED]> writes:
> 
> BAW> So are you saying that moving to svn will let us do more long
> BAW> lived branches?  Yay!
> 
> Yes, but you still have to be disciplined about it.  svn is not much
> better than cvs about detecting and ignoring spurious conflicts due to
> code that gets merged from branch A to branch B, then back to branch
> A.  Unrestricted cherry-picking is still out.

Yeah. IMHO the sadest thing about SVN is it doesn't do branch/merge
properly. All the other cool stuff like renames etc is kinda undone by
that. For a definition of properly, see;

http://prcs.sourceforge.net/merge.html

This is why I don't bother migrating any existing CVS projects to SVN;
the benefits don't yet outweigh the pain of migrating. For new projects
sure, SVN is a better choice than CVS.

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-06-27 Thread Donovan Baarda

On Mon, 2005-06-27 at 14:25, Phillip J. Eby wrote:
[...]
> As for the open issues, if we can't reach some sane compromise about 
> atime/ctime/mtime, I'd suggest just providing the stat() method and let 
> people use stat().st_mtime et al.  Alternately, I'd be okay with creating 
> last_modified(), last_accessed(), and created_on() methods that return 
> datetime objects, as long as there's also atime()/mtime()/ctime() methods 
> that return timestamps.

+1 for atime/mtime/ctime being timestamps
-1 for redundant duplicates that return DateTimes
+1 for a stat() method (there is lots of other goodies in a stat).

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Recommend accepting PEP 312 --Simple Implicit Lambda

2005-06-19 Thread Donovan Baarda

Josiah Carlson wrote:
> Donovan Baarda <[EMAIL PROTECTED]> wrote:
> 
>>Nick Coghlan wrote:
>>
>>>Donovan Baarda wrote:
[...]
>>But isn't a function just a deferred expression with a name :-)
> 
> 
> A function in Python is actually a deferred sequence of statements and
> expressions. An anonymous function in Python (a lambda) is a deferred
> expression.

in the end though, a sequence of statements that completes with a 
"return value" is, when treated as a black box, indistinguishable from 
an expression. Originally I thought that this also had to be qualified 
with "and has no side-effects", but I see now that is not the case.

[...]
>>Oh yeah Raymond: on the "def defines some variable name"... are you 
>>joking? You forgot the smiley :-)
> 
> 
> 'def' happens to bind the name that follows the def to the function with
> the arguments and body following the name.

Yeah, but we don't use "def" to bind arbitary variables, only 
functions/procedures. So in python, they are intimately identified with 
functions and procedures.

>>I don't get what the problem is with mixing statement and expression 
>>semantics... from a practial point of view, statements just offer a 
>>superset of expression functionality.
> 
> 
> Statements don't have a return value.  To be more precise, what is the
> value of "for i in xrange(10): z.append(...)"?  Examine the selection of
> statements available to Python, and ask that question.  The only one
> that MAY have a return value, is 'return' itself, which really requires
> an expression to the right (which passes the expression to the right to
> the caller's frame).  When you have statements that ultimately need a
> 'return' for a return value; you may as well use a standard function
> definition.

Hmmm. For some reason I thought that these kind of things would have a 
return value of None, the same as a function without an explicit return. 
I see now that this is not true...

>>If there really is a serious practical reason why they must be limited 
>>to expressions, why not just raise an exception or something if the 
>>"anonymous function" is too complicated...
> 
> 
> Define "too complicated"?

I was thinking that this is up to the interpreter... depending on what 
the practical limitations are that cause the limitation in the first 
place. For example... if it can't be reduced to an "expression" through 
simple transforms.

But look... I've gone and created another monster thread on 
"alternatives to lambda"... I'm going to shut up now.

--
Donovan Baarda
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Recommend accepting PEP 312 -- Simple Implicit Lambda

2005-06-18 Thread Donovan Baarda

Nick Coghlan wrote:
> Donovan Baarda wrote:
> 
>>As I see it, a lambda is an anonymous function. An anonymous function is 
>>a function without a name.
> 
> 
> And here we see why I'm such a fan of the term 'deferred expression' 
> instead of 'anonymous function'.

But isn't a function just a deferred expression with a name :-)

As a person who started out writing assembler where every "function" I 
wrote was a macro that got expanded inline, the distiction is kinda 
blurry to me.

> Python's lambda expressions *are* the former, but they are 
> emphatically *not* the latter.

Isn't that because lambda's have the limitation of not allowing 
statements, only expressions? I know this limitation avoids side-effects 
and has significance in some formal (functional?) languages... but is 
that what Python is? In the Python I use, lambda's are always used where 
you are too lazy to define a function to do it's job.

To me, anonymous procedures/functions would be a superset of "deferred 
expressions", and if the one stone fits perfectly in the slingshot we 
have and can kill multiple birds... why hunt for another stone?

Oh yeah Raymond: on the "def defines some variable name"... are you 
joking? You forgot the smiley :-)

I don't get what the problem is with mixing statement and expression 
semantics... from a practial point of view, statements just offer a 
superset of expression functionality.

If there really is a serious practical reason why they must be limited 
to expressions, why not just raise an exception or something if the 
"anonymous function" is too complicated...

I did some fiddling and it seems lambda's can call methods and stuff 
that can have side effects, which kinda defeats what I thought was the 
point of "statements vs expressions"... I guess I just don't 
understand... maybe I'm just thick :-)

> Anyway, the AlternateLambdaSyntax Wiki page has a couple of relevant 
> entries under 'real closures'.

Where is that wiki BTW? I remember looking at it ages ago but can't find 
the link anymore.

--
Donovan Baarda
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Recommend accepting PEP 312 -- Simple Implicit Lambda

2005-06-18 Thread Donovan Baarda

Kay Schluehr wrote:
> Josiah Carlson wrote:
> 
>  > Kay Schluehr <[EMAIL PROTECTED]> wrote:
>  >
>  >
>  >> Maybe anonymus function closures should be pushed forward right now 
> not only syntactically? Personally I could live with lambda or several
>  >> of the alternative syntaxes listed on the wiki page.

I must admit I ended up deleting most of the "alternative to lambda" 
threads after they flooded my in box. So it is with some dread I post 
this, contributing to it...

As I see it, a lambda is an anonymous function. An anonymous function is 
a function without a name. We already have a syntax for a function... 
why not use it. ie:

  f = filter(def (a): return a > 1, [1,2,3])

The implications of this are that both functions and procedures can be 
anonymous. This also implies that unlike lamba's, anonymous functions 
can have statements, not just expressions. You can even do compound 
stuff like;

   f = filter(def (a): b=a+1; return b>1, [1,2,3])

or if you want you can use indenting;

   f = filter(def (a):
 b=a+1
 return b>1, [1,2,3])

It also means the following becomes valid syntax;

f = def (a,b):
   return a>b

I'm not sure if there are syntactic ambiguities to this. I'm not sure if 
  the CS boffins are disturbed by "side effects" from statements. 
Perhaps both can be resolved by limiting annonymous functions to 
expressions. Or require use of brackets or ";" to resolve ambiguity.

This must have been proposed already and shot down in flames... sorry 
for re-visiting old stuff and contributing noise.

--
Donovan Baarda
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Withdrawn PEP 288 and thoughts on PEP 342

2005-06-17 Thread Donovan Baarda

On Fri, 2005-06-17 at 13:53, Joachim Koenig-Baltes wrote:
[...]
> My use case for this is a directory tree walking generator that
> yields all the files including the directories in a depth first manner.
> If a directory satisfies a condition (determined by the caller) the
> generator shall not descend into it.
> 
> Something like:
> 
> DONOTDESCEND=1
> for path in mywalk("/usr/src"):
> if os.path.isdir(path) and os.path.basename(path) == "CVS":
> continue DONOTDESCEND
> # do something with path
> 
> Of course there are different solutions to this problem with callbacks
> or filters but i like this one as the most elegant.

I have implemented almost exactly this use-case using the standard
Python generators, and shudder at the complexity something like this
would introduce.

For me, the right solution would be to either write your own generator
that "wraps" the other generator and filters it, or just make the
generator with additional (default value) parameters that support the
DONOTDECEND filtering.

FWIW, my usecase is a directory comparison generator that walks two
directorys producing tuples of corresponding files. It optionally will
not decend directories in either tree that do not have a corresponding
directory in the other tree. See;

http://minkirri.apana.org.au/~abo/projects/utils/

-- 
Donovan Baarda <[EMAIL PROTECTED]>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Re: switch statement

2005-04-25 Thread Donovan Baarda

On Mon, 2005-04-25 at 21:21 -0400, Brian Beck wrote:
> Donovan Baarda wrote:
> > Agreed. I don't find any switch syntaxes better than if/elif/else. Speed
> > benefits belong in implementation optimisations, not new bad syntax.
> 
> I posted this 'switch' recipe to the Cookbook this morning, it saves
> some typing over the if/elif/else construction, and people seemed to
> like it. Take a look:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410692

Very clever... you have shown that current python syntax is capable of
almost exactly replicating a C case statement.

My only problem is C case statements are ugly. A simple if/elif/else is
much more understandable to me. 

The main benefit in C of case statements is the compiler can optimise
them. This copy of a C case statement will be slower than an
if/elif/else, and just as ugly :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Re: switch statement

2005-04-25 Thread Donovan Baarda

On Mon, 2005-04-25 at 18:20 -0400, Jim Jewett wrote:
[...]
> If speed for a limited number of cases is the only advantage, 
> then I would say it belongs in (at most) the implementation, 
> rather than the language spec.  

Agreed. I don't find any switch syntaxes better than if/elif/else. Speed
benefits belong in implementation optimisations, not new bad syntax.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.

2005-03-21 Thread Donovan Baarda

On Mon, 2005-03-21 at 23:31 +1100, Donovan Baarda wrote:
> On Mon, 2005-03-21 at 11:42 +0100, Peter Astrand wrote:
> > On Mon, 21 Mar 2005, Donovan Baarda wrote:
> > 
> > > > > The only ways to ensure that a select process does not block like 
> > > > > this,
> > > > > without using non-blocking mode, are;
> > 
> > > > 3) Use os.read / os.write.
> > > [...]
> > >
> > > but os.read / os.write will block too.
> > 
> > No.
> [...]
> 
> Hmmm... you are right... that changes things. Blocking vs non-blocking
> becomes kinda moot if read/write will do partial writes in blocking
> mode.
> 
> > fread() should loop internally on EAGAIN, in blocking mode.
> 
> Yeah, I was talking about non-blocking mode...

Actually, in blocking mode you never get EAGAIN read() only gets
EAGAIN on an empty non-blocking read().

In non-blocking mode, EAGAIN is considered an error by fread(), so it
will return a partial read. The python implementation of file.read()
will return this partial read, and clear the EAGAIN error, or raise
IOError if it was an empty read (to differentiate between an empty read
and EOF).

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.

2005-03-21 Thread Donovan Baarda

On Tue, 2005-03-22 at 12:49 +1200, Greg Ewing wrote:
> Donovan Baarda wrote:
> 
> > Consider the following. This is pretty much the only way you can use
> > popen2 reliably without knowing specific behaviours of the executed
> > command;
> > 
>  > ...
> >   fcntl.fcntl(child_in, fcntl.F_SETFL, flags | os.O_NONBLOCK) # \
> > ...   # /
> >   fcntl.fcntl(child_out, fcntl.F_SETFL, flags | os.O_NONBLOCK)# \
> 
> I still don't believe you need to make these non-blocking.
> When select() returns a fd for reading/writing, it's telling
> you that the next os.read/os.write call on it will not block.
> Making the fd non-blocking as well is unnecessary and perhaps
> even undesirable.

Yeah... For some reason I had it in my head that os.read/os.write would
not do partial/incomplete reads/writes unless the file was in
non-blocking mode.

> > For 1) and 2), note that popen2 returns file objects, but as they cannot
> > be reliably used as file objects, we ignore them and grab their
> > fileno(). Why does popen2 return file objects if they cannot reliably be
> > used?
> 
> I would go along with giving file objects alternative read/write
> methods which behave more like os.read/os.write, maybe called
> something like readsome() and writesome(). That would eliminate
> the need to extract and manipulate the fds, and might make it
> possible to do some of this stuff in a more platform-independent
> way.

The fact that partial reads/writes are possible without non-blocking
mode changes things a fair bit. Also, the lack of fnctl support in
Windows needs to be taken into account too.

I still think the support for partial reads in non-blocking mode on
file.read() is inconsistent with the absence of partial write support in
file.write(). I think this PEP still has some merit for cleaning up this
inconsistency, but otherwise doesn't gain much... just adding a return
count to file.write() and clearing up the documented behaviour is enough
to do this.

The lack of support on win32 for non-blocking mode, combined with the
reduced need for it, makes adding a "setblocking" method undesirable.

I don't know what the best thing to do now is... I guess the
readsome/writesome is probably best, but given that os.read/os.write is
not that bad, perhaps it's best to just forget I even suggested this
PEP :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.

2005-03-21 Thread Donovan Baarda

On Mon, 2005-03-21 at 11:42 +0100, Peter Astrand wrote:
> On Mon, 21 Mar 2005, Donovan Baarda wrote:
> 
> > > > The only ways to ensure that a select process does not block like this,
> > > > without using non-blocking mode, are;
> 
> > > 3) Use os.read / os.write.
> > [...]
> >
> > but os.read / os.write will block too.
> 
> No.
[...]

Hmmm... you are right... that changes things. Blocking vs non-blocking
becomes kinda moot if read/write will do partial writes in blocking
mode.

> fread() should loop internally on EAGAIN, in blocking mode.

Yeah, I was talking about non-blocking mode...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP to make file objects support non-blockingmode.

2005-03-21 Thread Donovan Baarda

G'day,

From: "Greg Ward" <[EMAIL PROTECTED]>
> On 18 March 2005, Donovan Baarda said:
[...]
> > Currently the built in file type does not support non-blocking mode very
> > well.  Setting a file into non-blocking mode and reading or writing to
it
> > can only be done reliably by operating on the file.fileno() file
descriptor.
> > This requires using the fnctl and os module file descriptor manipulation
> > methods.
>
> Is having to use fcntl and os really so awful?  At least it requires
> the programmer to prove he knows what he's doing putting this file
> into non-blocking mode, and that he really wants to do it.  ;-)

It's not that bad I guess... but then I'm proposing a very minor change to
fix it.

The bit that annoys me is popen2() and select() give this false sense of
"File Object compatability", when in reality you can't use them reliably
with file objects.

It is also kind of disturbing that file.read() actually does work in
non-blocking mode, but file.write() doesn't. The source for file.read()
shows a fair bit of effort towards making it work for non-blocking mode...
why not do the same for file.write()?

> > Details
> > ===
> >
> > The documentation of file.read() warns; "Also note that when in
non-blocking
> > mode, less data than what was requested may be returned, even if no size
> > parameter was given".  An empty string is returned to indicate an EOF
> > condition.  It is possible that file.read() in non-blocking mode will
not
> > produce any data before EOF is reached.  Currently there is no
documented
> > way to identify the difference between reaching EOF and an empty
> > non-blocking read.
> >
> > The documented behaviour of file.write() in non-blocking mode is
undefined.
> > When writing to a file in non-blocking mode, it is possible that not all
of
> > the data gets written.  Currently there is no documented way of handling
or
> > indicating a partial write.
>
> That's more interesting and a better motivation for this PEP.

The other solution to this of course is to simply say "file.read() and
file.write() don't work in non-blocking mode", but that would be a step
backwards for the current file.read().

> > file.read([size]) Changes
> > --
> >
> > The read method's current behaviour needs to be documented, so its
actual
> > behaviour can be used to differentiate between an empty non-blocking
read,
> > and EOF.  This means recording that IOError(EAGAIN) is raised for an
empty
> > non-blocking read.
> >
> >
> > file.write(str) Changes
> > 
> >
> > The write method needs to have a useful behaviour for partial
non-blocking
> > writes defined, implemented, and documented.  This includes returning
how
> > many bytes of "str" are successfully written, and raising
IOError(EAGAIN)
> > for an unsuccessful write (one that failed to write anything).
>
> Proposing semantic changes to file.read() and write() is bound to
> raise hackles.  One idea for soothing such objections: only make these
> changes active when setblocking(False) is in effect.  I.e., a
> setblocking(True) file (the default, right?) behaves as you described
> above, warts and all.  (So old code that uses fcntl() continues to
> "work" as before.)  But files that have had setblocking(False) called
> could gain these new semantics that you propose.

There is nothing in this proposal that would break or change the behaviour
of any existing code, unless it was relying on file.write() returning None.
or checking that file objects don't have a "setblocking" method.

Note that the change for file.read() is simply to document the current
behaviour... not to actually change it.

The change for file.write() is a little more dramatic, but I really can't
imagine anyone relying on file.write() returning None. A compromise would be
to have file.write() return None in blocking mode, and a count in
non-blocking mode... but I still can't believe people will rely on it
returning None :-) It would be more useful to always return a count, so that
methods using them could handle both modes easily.

Note that I did consider some more dramatic changes that would have made
them even easier to use. Things like raising an exception for EOF instead of
EAGAIN would actually make a lot of things easier to code... but it would be
too big a change.


Donovan Baardahttp://minkirri.apana.org.au/~abo/


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.

2005-03-21 Thread Donovan Baarda

G'day,

From: "Peter Astrand" <[EMAIL PROTECTED]>
> On Mon, 21 Mar 2005, Donovan Baarda wrote:
[...]
> This is no "trap". When select() indicates that you can write or read, it
> means that you can write or read at least one byte. The .read() and
> .write() file methods, however, always writes and reads *everything*.
> These works, basically, just like fread()/fwrite().

yep, which is why you can only use them reliably in a select loop if you
read/write one byte at a time.

> > The only ways to ensure that a select process does not block like this,
> > without using non-blocking mode, are;
> >
> > 1) use a buffer size of 1 in the select process.
> >
> > 2) understand the child process's read/write behaviour and adjust the
> > selector process accordingly... ie by making the buffer sizes just right
> > for the child process,
>
> 3) Use os.read / os.write.
[...]

but os.read / os.write will block too. Try it... replace the file
read/writes in selector.py. They will only do partial reads if the file is
put into non-blocking mode.

> > I think the fread/fwrite and read/write behaviour is posix standard and
> > possibly C standard stuff... so it _should_ be the same on other
> > platforms.
>
> Sorry if I've misunderstood your point, but fread()/fwrite() does not
> return EAGAIN.

no, fread()/fwrite() will return 0 if nothing was read/written, and ferror()
will return EAGAIN to indicated that it was a "would block" condition at
least I think it does... the man page simply says ferror() returns a
non-zero value.

Looking at the python implementation of file.read(), for an empty fread()
where ferror() is non zero, it only raises IOError if errno is not EAGAIN or
EWOULDBLOCK. It blindly clearerr()'s for any other partial read.

The implementation of file.write() raises IOError whenever there is an
incomplete write.

So it looks, as I pointed out in the draft PEP, that the current file.read()
supports non-blocking mode, but file.write() doesn't... a bit asymmetric :-)

Donovan Baardahttp://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.

2005-03-21 Thread Donovan Baarda

On Mon, 2005-03-21 at 17:32 +1200, Greg Ewing wrote:
> > On 18 March 2005, Donovan Baarda said:
> 
> >>Many Python library methods and classes like select.select(), os.popen2(),
> >>and subprocess.Popen() return and/or operate on builtin file objects.
> >>However even simple applications of these methods and classes require the
> >>files to be in non-blocking mode.
> 
> I don't agree with that. There's no need to use non-blocking
> I/O when using select(), and in fact things are less confusing
> if you don't.

You would think that... and the fact that select, popen2 etc all use
file objects encourage you to think that. However, this is a trap that
can catch you out badly. Check the attached python scripts that
demonstrate the problem.

Because staller.py outputs and flushes a fragment of data smaller than
selector.py uses for its reads, the select statement is triggered, but
the corresponding read blocks.

A similar thing can happen with writes... if the child process consumes
a fragment smaller than the write buffer of the selector process, then
the select can trigger and the corresponding write can block because
there is not enough space in the file buffer.

The only ways to ensure that a select process does not block like this,
without using non-blocking mode, are;

1) use a buffer size of 1 in the select process.

2) understand the child process's read/write behaviour and adjust the
selector process accordingly... ie by making the buffer sizes just right
for the child process,

Note that it all interacts with the file objects buffer sizes too...
making for some extremely hard to debug intermittent behaviour.

> >>The read method's current behaviour needs to be documented, so its actual
> >>behaviour can be used to differentiate between an empty non-blocking read,
> >>and EOF.  This means recording that IOError(EAGAIN) is raised for an empty
> >>non-blocking read.
> 
> Isn't that unix-specific? The file object is supposed to
> provide a more or less platform-independent interface, I
> thought.

I think the fread/fwrite and read/write behaviour is posix standard and
possibly C standard stuff... so it _should_ be the same on other
platforms.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

selector.py
Description: application/python

staller.py
Description: application/python
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP to make file objects support non-blocking mode.

2005-03-20 Thread Donovan Baarda

On Fri, 2005-03-18 at 20:41 -0500, James Y Knight wrote:
> On Mar 18, 2005, at 8:19 PM, Greg Ward wrote:
> > Is having to use fcntl and os really so awful?  At least it requires
> > the programmer to prove he knows what he's doing putting this file
> > into non-blocking mode, and that he really wants to do it.  ;-)

Consider the following. This is pretty much the only way you can use
popen2 reliably without knowing specific behaviours of the executed
command;

import os,fnctl,select

def process_data(cmd,data):
  child_in, child_out = os.popen2(cmd)
  child_in = child_in.fileno()# /
  flags = fcntl.fcntl(child_in, fcntl.F_GETFL)# |1)
  fcntl.fcntl(child_in, fcntl.F_SETFL, flags | os.O_NONBLOCK) # \
  child_out = child_out.fileno()  # /
  flags = fcntl.fcntl(child_out, fcntl.F_GETFL)   # |2)
  fcntl.fcntl(child_out, fcntl.F_SETFL, flags | os.O_NONBLOCK)# \
  ans = ""
  li = [child_out]
  lo = [child_in]
  while li or lo:
i,o,e = select.select(li,lo,[]) # 3
if i:
  buf = os.read(child_out,2048) # 4
  if buf:
ans += buf
  else:
li=[]
if o:
  if data:
count=os.write(child_in,data[:2048])   # 4
data = data[count:]
  else:
lo=[]
 return ans

For 1) and 2), note that popen2 returns file objects, but as they cannot
be reliably used as file objects, we ignore them and grab their
fileno(). Why does popen2 return file objects if they cannot reliably be
used? The flags get/set using fnctl is arcane stuff for what is pretty
much essential operations after a popen2. These could be replaced by;

 child_in.setblocking(False)
 child_out.setblocking(False)

For 3), select() can operate on file objects directly. However, since
you cannot reliably read/write file objects in non-blocking mode, we use
the fileno's. Why can select operate with file objects if file objects
cannot be reliably read/written?

For 4), we are using the os.read/write methods to operate on the
fileno's. Under the proposed PEP we could use the file objects
read/write methods instead.

I guess the thing that annoys me the most is the asymmetry of popen2 and
select using file objects, but needing to use the os.read/write and
fileno()'s for reading and writing.

> I'd tend to agree. :) Moreover, I don't think fread/fwrite are 
> guaranteed to work as you would expect with non-blocking file 
> descriptors. So, providing a setblocking() call to files would require 
> calling read/write instead of fread/fwrite in all the file methods, at 
> least when in non-blocking mode. I don't think that's a good idea.

Hmm.. I assumed file.read() and file.write() were implemented using
read/write from their observed behaviour. The documentation of
fread/fwrite doesn't mention the behaviour in non-blocking mode at all.
The observed behaviour suggests that fread/fwrite are implemented using
read/write and hence get the same behaviour. The documentation implies
that the behaviour in non-blocking mode will reflect the behaviour of
read/write, with EAGAIN errors reported via ferror() indicating empty
non-blocking reads/writes.

If the behaviour of fread/fwrite is indeed indeterminate under
non-blocking mode, then yes, file objects in non-blocking mode would
have to use read/write instead of fread/fwrite. However, I don't think
this is required.

I know this PEP is kinda insignificant and minor. It doesn't save much,
but it doesn't change much, and makes things a bit cleaner.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Draft PEP to make file objects support non-blocking mode.

2005-03-17 Thread Donovan Baarda

G'day,

the recent thread about thread semantics for file objects reminded me I
had a draft pep for extending file objects to support non-blocking
mode. 

This is handy for handling files in async applications (the non-threaded
way of doing things concurrently).

Its pretty rough, but if I fuss over it any more I'll never get it
out...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/
PEP: XXX
Title: Make builtin file objects support non-blocking mode
Version: $Revision: 1.0 $
Last-Modified: $Date: 2005/03/18 11:34:00 $
Author: Donovan Baarda <[EMAIL PROTECTED]>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 06-Jan-2005
Python-Version: 3.5
Post-History: 06-Jan-2005


Abstract


This PEP suggests a way that the existing builtin file type could be 
extended to better support non-blocking read and write modes required for 
asynchronous applications using things like select and popen2.


Rationale
=

Many Python library methods and classes like select.select(), os.popen2(),
and subprocess.Popen() return and/or operate on builtin file objects.
However even simple applications of these methods and classes require the
files to be in non-blocking mode.

Currently the built in file type does not support non-blocking mode very
well.  Setting a file into non-blocking mode and reading or writing to it
can only be done reliably by operating on the file.fileno() file descriptor.
This requires using the fnctl and os module file descriptor manipulation
methods.


Details
===

The documentation of file.read() warns; "Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given".  An empty string is returned to indicate an EOF
condition.  It is possible that file.read() in non-blocking mode will not
produce any data before EOF is reached.  Currently there is no documented
way to identify the difference between reaching EOF and an empty
non-blocking read.

The documented behaviour of file.write() in non-blocking mode is undefined.
When writing to a file in non-blocking mode, it is possible that not all of
the data gets written.  Currently there is no documented way of handling or
indicating a partial write.

The file.read() and file.write() methods are implemented using the
underlying C read() and write() fuctions.  As a side effect of this, they
have the following undocumented behaviour when operating on non-blocking
files;

A file.write() that fails to write all the provided data immediately will
write part of the data, then raise IOError with an errno of EAGAIN.  There
is no indication how much of the data was successfully written.

A file.read() that fails to read all the requested data immediately will
return the partial data that was read.  A file.read() that fails to read any
data immediately will raise IOError with an errno of EAGAIN.


Proposed Changes


What is required is to add a setblocking() method that simplifies setting
non-blocking mode, and extending/documenting read() and write() so they can
be reliably used in non-blocking mode.


file.setblocking(flag) Extension


This method implements the socket.setblocking() method for file objects.  if
flag is 0, the file is set to non-blocking, else to blocking mode.  


file.read([size]) Changes
--

The read method's current behaviour needs to be documented, so its actual
behaviour can be used to differentiate between an empty non-blocking read,
and EOF.  This means recording that IOError(EAGAIN) is raised for an empty
non-blocking read.


file.write(str) Changes


The write method needs to have a useful behaviour for partial non-blocking
writes defined, implemented, and documented.  This includes returning how
many bytes of "str" are successfully written, and raising IOError(EAGAIN)
for an unsuccessful write (one that failed to write anything).


Impact of Changes
=

As these changes are primarily extensions, they should not have much impact
on any existing code.

The file.read() changes are only documenting current behaviour. This could
have no impact on any existing code.

The file.write() change makes this method return an int instead of returning
nothing (None). The only code this could affect would be something relying
on file.write() returning None. I suspect there is no code that would do
this.

The file.setblocking() change adds a new method. The only existing code this
could affect is code that checks for the presense/absense of a setblocking
method on a file. There may be code out there that does this to
differentiate between a file and a socket. As there are much better ways to
do this, I suspect that there would be no code that does this.


Examples


For example, the following simple code using popen2 will "hang" if the
huge_in string

Re: [Python-Dev] Re: No new features

2005-03-09 Thread Donovan Baarda

G'day again,

From: "Michael Hudson" <[EMAIL PROTECTED]>
> "Donovan Baarda" <[EMAIL PROTECTED]> writes:
>
> >
> > Just my 2c;
> >
> > I don't mind new features in minor releases, provided they meet the
> > following two criteria;
> >
> > 1) Don't break the old API! The new features must be pure extensions
that in
> > no way change the old API. Any existing code should be un-effected in
any
> > way by the change.
> >
> > 2) The new features must be clearly documented as "New in version
X.Y.Z".
> > This way people using these features will know the minium Python version
> > required for their application.
>
> No no no!  The point of what Anthony is saying, as I read it, is that
> experience suggests it is exactly this sort of change that should be
> avoided.  Consider the case of Mac OS X 10.2 which came with Python
> 2.2.0: this was pretty broken anyway because of some apple snafus but
> it was made even more useless by the fact that people out in the wild
> were writing code for 2.2.1 and using True/False/bool.  Going from
> 2.x.y to 2.x.y+1 shouldn't break anything, going from 2.x.y+1 to 2.x.y
> shouldn't break anything that doesn't whack into a bug in 2.x.y -- and
> "not having bool" isn't a bug in this sense.

You missed the "minor releases" bit in my post.

major releases, ie 2.x -> 3.0, are for things that can break existing code.
They change the API so that things that run on 2.x may not work with 3.x.

minor releases, ie 2.2.x ->2.3.0, are for things that cannot break existing
code. They can extend the API such that code for 2.3.x may not work on
2.2.x, but code that runs on 2.2.x must work on 2.3.x.

micro releases, ie 2.2.1 ->2.2.2, are for bug fixes only. There can be no
changes to the API, so that all code that runs on 2.2.2 should work with
2.2.1, barring the bugs fixed.

The example you cited of adding bool was an extension to the API, and hence
should have been a minor release, not a micro release.

I just read the PEP-6, and it doesn't seem to use this terminology, or make
this distinction... does something else do this anywhere? I thought this
approach was common knowledge...

Donovan Baardahttp://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: No new features (was Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules ossaudiodev.c, 1.35, 1.36)

2005-03-09 Thread Donovan Baarda

G'day,

From: "Anthony Baxter" <[EMAIL PROTECTED]>
> On Wednesday 09 March 2005 12:21, Greg Ward wrote:
> > On 09 March 2005, Anthony Baxter said (privately):
> > > Thanks! I really want to keep the no-new-features thing going for
> > > as long as possible, pending any AoG (Acts of Guido), of course.
[...]
> Initially, I was inclined to be much less anal about the no-new-features
> thing. But since doing it, I've had a quite large number of people tell me
how
> much they appreciate this approach -  vendors, large companies with huge
> installed bases of Python, and also from people releasing software written
> in Python.  Very few people offer the counter argument as a general case -
> with the obvious exception that everyone has their "just this one little
> backported feature, plase!" (I'm the same - there's been times where
> I've had new features I'd have loved to see in a bugfix release, just so I
> could use them sooner).

Just my 2c;

I don't mind new features in minor releases, provided they meet the
following two criteria;

1) Don't break the old API! The new features must be pure extensions that in
no way change the old API. Any existing code should be un-effected in any
way by the change.

2) The new features must be clearly documented as "New in version X.Y.Z".
This way people using these features will know the minium Python version
required for their application.

Any change that breaks rule 1) must be delayed until a major release. Any
change that breaks rule 2) is a documentation bug that needs fixing.


Donovan Baardahttp://minkirri.apana.org.au/~abo/


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-18 Thread Donovan Baarda

From: "Martin v. Löwis" <[EMAIL PROTECTED]>
> Donovan Baarda wrote:
> > This patch keeps the current md5c.c, md5module.c files and adds the
> > following; _hashopenssl.c, hashes.py, md5.py, sha.py.
> [...]
> > If all we wanted to do was fix the md5 module
>
> If we want to fix the licensing issues with the md5 module, this patch
> does not help at all, as it keeps the current md5 module (along with
> its licensing issues). So any patch to solve the problem will need
> to delete the code with the questionable license.

It maybe half fixes it in that if Python is happy with the RSA one, they can
continue to include it, and if Debian is unhappy with it, they can remove it
and build against openssl.

It doesn't fully fix the license problem. It is still worth considering
because it doesn't make it worse, and it does allow Python to use much
faster implementations and support other digest algorithms when openssl is
available.

> Then, the approach in the patch breaks the promise that the md5 module
> is always there. It would require that OpenSSL is always there - a
> promise that we cannot make (IMO).

It would be better if found an alternative md5c.c. I found one that was the
libmd implementation that someone mildly tweaked and then slapped an LGPL
on. I have a feeling that would make the lawyers tremble more than the
"public domain" libmd one, unless they are happy that someone else is
prepared to wear the grief for slapping a LGPL onto something public domain.

Probably the best at the moment is the sourceforge one, which is listed as
having a "zlib/libpng licence".

Donovan Baardahttp://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] builtin_id() returns negative numbers

2005-02-18 Thread Donovan Baarda

From: "Armin Rigo" <[EMAIL PROTECTED]>
> Hi Tim,
>
>
> On Thu, Feb 17, 2005 at 01:44:11PM -0500, Tim Peters wrote:
> > >256 ** struct.calcsize('P')
> >
> > Now if you'll just sign and fax a Zope contributor agreement, I'll
> > upgrade ZODB to use this slick trick .
>
> I hereby donate this line of code to the public domain :-)

Damn... we can't use it then!

Seriously, on the Python lists there has been a discussion rejecting an
md5sum implementation because the author "donated it to the public domain".
Apparently lawyers have decided that you can't give code away. Intellectual
charity is illegal :-)

Donovan Baardahttp://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-17 Thread Donovan Baarda

On Wed, 2005-02-16 at 22:53 -0800, Gregory P. Smith wrote:
> fyi - i've updated the python sha1/md5 openssl patch.  it now replaces
> the entire sha and md5 modules with a generic hashes module that gives
> access to all of the hash algorithms supported by OpenSSL (including
> appropriate legacy interface wrappers and falling back to the old code
> when compiled without openssl).
> 
>  
> https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470
> 
> I don't quite like the module name 'hashes' that i chose for the
> generic interface (too close to the builtin hash() function).  Other
> suggestions on a module name?  'digest' comes to mind.

I just had a quick look, and have these comments (psedo patch review?).
Apologies for the noise on the list...

DESCRIPTION
===

This patch keeps the current md5c.c, md5module.c files and adds the
following; _hashopenssl.c, hashes.py, md5.py, sha.py.

The old md5 and sha extension modules get replaced by hashes.py, md5.py,
and sha.py python modules that leverage off _hash (openssl) or _md5 and
_sha (no openssl) extension modules.

The new _hash extension module "wraps" the high level openssl EVP
interface, which uses a string parameter to indicate what type of
message digest algorithm to use. The advantage of this is it makes all
openssl supported digests available, and if openssl adds more, we get
them for free. A disadvantage of this is it is an abstraction level
above the actual md5 and sha implementations, and this may add
overheads. These overheads are probably negligible compared to the
actual implementation speedups.

The new _md5 and _sha extension modules are simply re-named versions of
the old md5 and sha modules.

The hashes.py module acts as an import wrapper for _hash, and falls back
to using _md5 and _sha modules if _hash is not available. It provides an
EVP style API (string hash name parameter), that supports only md5 and
sha hashes if openssl is not available.

The new md5.py and sha.py modules simply use hash.py.

COMMENTS

The introduction of a "hashes" module with a new API that supports many
different digests (provided openssl is available) is extending Python,
not just "fixing the licenses" of md5 and sha modules.

If all we wanted to do was fix the md5 module, a simpler solution would
be to change the md5c.c API to match openssl's implementation, and make
md5module.c use it, conditionally compiling against md5c.c or linking
against openssl in setup.py. A similar approach could be used for sha,
but would require stripping the sha implementation out of shamodule.c

I am mildly of concerned about the namespace/filespace clutter
introduced by this implementation... it feels unnecessary, as does the
tangled dependencies between them. With openssl, hashes.py duplicates
the functionality of _hash. Without openssl, md5.py and sha.py duplicate
_md5 and _sha, via a roundabout route through hash.py.

The python wrappers seem overly complicated, with things like

  def new(name, string=None):
if string:
  return _hash.new(name)
else:
  return _hash.new.(name,string)

being common where the following would suffice;

  def new(name,string=""):
return _hash.new(name,string)

I think this is because _hash.new() uses an optional string parameter,
but I have a feeling a C update with a zero length string is faster than
this Python if. If it was a concern, the C implementation could check
the value of the string length before calling update.

Given the convenience methods for different hashes in hashes.py (which
incidentally look like they are only available when _hash is not
available... something else that needs fixing), the md5.py module could
be simply coded as;

  from hashes import md5
  new = md5

Despite all these nit-picks, it looks pretty good. It is orders of
magnitude better than any of the other non-existent solutions, including
the one I didn't code :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-13 Thread Donovan Baarda

G'day,

On Sat, 2005-02-12 at 13:04 -0800, Gregory P. Smith wrote:
> On Sat, Feb 12, 2005 at 08:37:21AM -0500, A.M. Kuchling wrote:
> > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote:
> > > Are there any potential problems with making the md5sum module 
> > > availability
> > > "optional" in the same way as this?
> > 
> > The md5 module has been a standard module for a long time; making it
> > optional in the next version of Python isn't possible.  We'd have to
> > require OpenSSL to compile Python.
> > 
> > I'm happy to replace the MD5 and/or SHA implementations with other
> > code, provided other code with a suitable license can be found.
> > 
> 
> agreed.  it can not be made optional.  What I'd prefer (and will do if
> i find the time) is to have the md5 and sha1 module use OpenSSLs
> implementations when available.  Falling back to their built in ones
> when openssl isn't present.  That way its always there but uses the
> much faster optimized openssl algorithms when they exist.

So we need a fallback md5 implementation for when openssl is not
available.

The RSA implementation is not usable because it has an unsuitable
license. Looking at this licence again, I'm not sure what the problem
is. It allows you to freely modify, distribute, etc, with the only limit
you must retain the RSA licence blurb.

The libmd implementation cannot be used because the author tried to give
it away unconditionally, and the lawyers say you can't. (dumb! dumb!
dumb! someone needs to figure out a way to systematically get around
this kind of stupidity, perhaps have someone in a less legally stupid
country claim and re-license free code).

The libmd5-rfc sourceforge project implementation
<http://sourceforge.net/projects/libmd5-rfc/> looks OK. It needs to be
modified to have an API identical to openssl (rename
structures/functions). Then setup.py needs to be modified to use openssl
if available, or fallback to the provided libmd5-rfc implementation.

The SHA module is a bit different... it includes a built in SHA
implementation. It might pay to strip out the implementation and give it
an openssl-like API, then make shamodule.c a use it, or openssl if
available. Greg Smith might have already done much of this...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c

2005-02-13 Thread Donovan Baarda

On Sat, 2005-02-12 at 17:35 -0800, Gregory P. Smith wrote:
> I've created an OpenSSL version of the sha module.  trivial to modify
> to be a md5 module.  Its a first version with cleanup to be done and
> such.  being managed in the SF patch manager:
> 
>  
> https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470
> 
> enjoy.  i'll do more cleanup and work on it soon.

Hmmm. I see the patch entry, but it seems to be missing the actual
patch.

Did you code this from scratch, or did you base it on the current
md5module.c? Is it using the openssl sha interface, or the higher level
EVP interface? 

The reason I ask is it would be pretty trivial to modify md5module.c to
use the openssl API for any digest, and would be less risk than
fresh-coding one.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-12 Thread Donovan Baarda

G'day again,

From: "Gregory P. Smith" <[EMAIL PROTECTED]>
> > I think it would be cleaner and simpler to modify the existing
> > md5module.c to use the openssl md5 layer API (this is just a
> > search/replace to change the function names). The bigger problem is
> > deciding what/how/whether to include the openssl md5 implementation
> > sources so that win32 can use them.
>
> yes, that is all i was suggesting.
>
> win32 python is already linked against openssl for the socket module
> ssl support, having the md5 and sha1 modules depend on openssl should
> not cause a problem.

IANAL... I have too much common sense, so I won't argue licences :-)

So is openssl already included in the Python sources, or is it just a
dependency? I had a quick look and couldn't find it so it must be a
dependency.

Given that Python is already dependant on openssl, it makes sense to change
md5sum to use it. I have a feeling that openssl internally uses md5, so this
way we wont link against two different md5sum implementations.


Donovan Baardahttp://minkirri.apana.org.au/~abo/


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-12 Thread Donovan Baarda

G'day,

From: "Bob Ippolito" <[EMAIL PROTECTED]>
> On Feb 11, 2005, at 6:11 PM, Donovan Baarda wrote:
[...]
> > Given that Python is already dependant on openssl, it makes sense to
> > change
> > md5sum to use it. I have a feeling that openssl internally uses md5,
> > so this
> > way we wont link against two different md5sum implementations.
>
> It is an optional dependency that is used when present (read: not just
> win32).  The sources are not included with Python.

Are there any potential problems with making the md5sum module availability
"optional" in the same way as this?

> OpenSSL does internally have an implementation of md5 (and sha1, among
> other things).

Yeah, I know, that's why it could be used for the md5sum module :-)

What I meant was a Python application using ssl sockets and the md5sum
module will effectively have two different md5sum implementations in memory.
Using the openssl md5sum for the md5sum module will make it "leaner", as
well as faster.


Donovan Baardahttp://minkirri.apana.org.au/~abo/


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-10 Thread Donovan Baarda

On Fri, 2005-02-11 at 17:15 +1100, Donovan Baarda wrote:
[...]
> I think it would be cleaner and simpler to modify the existing
> md5module.c to use the openssl md5 layer API (this is just a
> search/replace to change the function names). The bigger problem is
> deciding what/how/whether to include the openssl md5 implementation
> sources so that win32 can use them.

Thinking about it, probably the best way is to include the libmd md5c.c
modified to use the openssl API, and then use configure to check for and
use openssl if it is available. That way win32 could use the provided
md5c.c, and other platforms could use the faster openssl.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-10 Thread Donovan Baarda

On Thu, 2005-02-10 at 23:13 -0500, Bob Ippolito wrote:
> On Feb 10, 2005, at 9:50 PM, Donovan Baarda wrote:
> 
> > On Thu, 2005-02-10 at 21:30 -0500, Bob Ippolito wrote:
[...]
> > Only problem with this, is pyopenssl doesn't yet include any mdX or sha
> > modules.
> 
> My bad, how about M2Crypto <http://sandbox.rulemaker.net/ngps/m2/> 
> then?  This one supports message digests and is more license compatible 
> with Python to boot.
[...]

This one does have md5 support, but the Python API is rather different
from the current python md5sum API. It hooks into the slightly higher
level MVP openssl layer, rather than the lower level md5 layer. Hooking
into the MVP layer pretty much requires including all the openssl
message digest implementations (which may or may not be a good idea).

It also uses SWIG to generate the extension module. I don't think
anything else in Python itself uses SWIG, so starting to use it would
introduce a "Build Dependency".

I think it would be cleaner and simpler to modify the existing
md5module.c to use the openssl md5 layer API (this is just a
search/replace to change the function names). The bigger problem is
deciding what/how/whether to include the openssl md5 implementation
sources so that win32 can use them.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-10 Thread Donovan Baarda

On Thu, 2005-02-10 at 21:30 -0500, Bob Ippolito wrote:
> On Feb 10, 2005, at 9:15 PM, Donovan Baarda wrote:
> 
> > On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote:
[...]
> One possible alternative would be to bring in something like PyOpenSSL 
> <http://pyopenssl.sourceforge.net/> and just rewrite the md5 (and sha?) 
> extensions as Python modules that use that API.

Only problem with this, is pyopenssl doesn't yet include any mdX or sha
modules.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-10 Thread Donovan Baarda

On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote:
> > The md5.h/md5c.c files allow "copy and use", but no modification of
> > the files. There are some alternative implementations, i.e. in glibc,
> > openssl, so a replacement should be sage. Any other requirements when
> > considering a replacement?

One thing to consider is "degree of difficulty" :-)

> > Matthias
> 
> I believe the "plan" for md5 and sha1 and such is to use the much
> faster openssl versions "in the future" (based on a long thread
> debating future interfaces to such things on python-dev last summer).
> That'll sidestep any tedious license issue and give a better
> implementation at the same time.  i don't believe anyone has taken the
> time to make such a patch yet.

I wasn't around for that discussion. There are two viable replacements
for the RSA implementation currently used; 

libmd <http://www.penguin.cz/~mhi/libmd/>
openssl <http://www.openssl.org/>.

The libmd implementation is by Colin Plumb and has the licence; "This
code is in the public domain; do with it what you wish." The API is
identical to the RSA implementation and BSD world's libmd and hence is a
drop in replacement. This implementation is faster than the RSA
implementation.

The openssl implementation has an apache style license. The API is
almost the same but slightly different to the RSA API, so it would
require a little bit of work to make it fit. This implementation is the
fastest currently available, as it includes many platform specific
optimisations for a large range of platforms.

Currently md5c.c is included in the python sources. The libmd
implementation has a drop in replacement for md5c.c. The openssl
implementation is a complicated tangle of Makefile expanded template
code that would be harder to include in the Python sources.

In the Linux world, openssl is starting to become ubiquitous, so not
including it and statically or even dynamically linking against it is
feasible. However, using Python in other lands will probably require
something to be included.

Long term, I think openssl is the way to go. Short term, libmd is a
painless replacement that gets around the licencing issues.

I have been using the libmd API stuff for md4 in librsync, and am
looking at migrating to the openssl API. If people hassle me, I could
probably do the openssl API migration for Python, but I'm not sure what
the best approach would be to including the source in Python sources.

FWIW, I also have an md4sum module and md4c.c implementation that I'm
happy to contribute to Python (done for pysysnc).

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: Moving towards Python 3.0 (was Re: [Python-Dev] Speed up function calls)

2005-01-31 Thread Donovan Baarda

On Tue, 2005-02-01 at 10:30 +1100, Donovan Baarda wrote:
> On Mon, 2005-01-31 at 15:16 -0500, Nathan Binkert wrote:
> > > Wouldn't it be nicer to have a facility that let you send messages
> > > between processes and manage concurrency properly instead?  You'll need
[...]
> A quick google search revealed this;
> 
> http://www.heise.de/ct/english/98/13/140/
> 
> Keeping in mind the high overheads of sharing memory between CPU's, the
> discussion about threads at this url seems to confirm; threads with
> shared memory are hard to distribute over multiple CPU's. Different OS's
> and/or thread implementations have tried (or just outright rejected)
> different ways of doing it, to varying degrees of success. IMHO, the
> fact that QNX doesn't distribute threads speaks volumes.

Sorry for replying to my reply, but I forgot the bit that brings it all
back On Topic :-)

The belief that the opcode granularity thread-switch driven by the GIL
is the cause of Python's threads being non-distributable is only half
true. 

Since OS's don't distribute threads well, any attempts to "Fix Python's
Threading" in an attempt to make its threads distributable is a waste of
time. The only thing that this might achieve would be to reduce the
latency on thread switches, maybe allowing faster response to OS events
like signals. However, the complexity introduced would cause more
problems than it would fix, and could easily result in worse
performance, not better.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: Moving towards Python 3.0 (was Re: [Python-Dev] Speed up function calls)

2005-01-31 Thread Donovan Baarda

On Mon, 2005-01-31 at 15:16 -0500, Nathan Binkert wrote:
> > Wouldn't it be nicer to have a facility that let you send messages
> > between processes and manage concurrency properly instead?  You'll need
> > most of this anyway to do multithreading sanely, and the benefit to the
> > multiple process model is that you can scale to multiple machines, not
> > just processors.  For brokering data between processes on the same
> > machine, you can use mapped memory if you can't afford to copy it
> > around, which gives you basically all the benefits of threads with
> > fewer pitfalls.
> 
> I don't think this is an answered problem.  There are plenty of
> researchers on both sides of this fence.  It is not been proven at all
> that threads are a bad model.
> 
> http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf or even
> http://www.python.org/~jeremy/weblog/030912.html

These are both threads vs events discussions (ie, threads vs an
async-event handler loop). This has nearly nothing to do with multiple
CPU utilisation. The real discussion for multiple CPU utilisation is
threads vs processes.

Once again, my knowledge of this is old and possibly out of date, but
threads do not scale well on multiple CPU's because threads use shared
memory between each thread. Multiple CPU hardware _can_ have physically
shared memory, but it is hardware hell keeping CPU caches in sync etc.
It is much easier to build a multi-CPU machine with separate memory for
each CPU, and high speed communication channels between each CPU. I
suspect most modern multi-CPU's use this architecture. 

Assuming they have the separate-memory architecture, you get much better
CPU utilisation if you design your program as separate processes
communicating together, not threads sharing memory. In fact, it wouldn't
surprise me if most Operating Systems that support threads don't support
distributing threads over multiple CPU's at all.

A quick google search revealed this;

http://www.heise.de/ct/english/98/13/140/

Keeping in mind the high overheads of sharing memory between CPU's, the
discussion about threads at this url seems to confirm; threads with
shared memory are hard to distribute over multiple CPU's. Different OS's
and/or thread implementations have tried (or just outright rejected)
different ways of doing it, to varying degrees of success. IMHO, the
fact that QNX doesn't distribute threads speaks volumes.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python Interpreter Thread Safety?

2005-01-28 Thread Donovan Baarda

On Sat, 2005-01-29 at 00:24 +0100, "Martin v. Löwis" wrote:
> Evan Jones wrote:
[...]
> The allocator is thread-safe in the presence of the GIL - you are
> supposed to hold the GIL before entering the allocator. Due to some
> unfortunate historical reasons, there is code which enters free()
> without holding the GIL - and that is what the allocator specifically
> deals with. Except for this single case, all callers of the allocator
> are required to hold the GIL.

Just curious; is that "one case" a bug that needs fixing, or is the some
reason this case can't be changed to use the GIL? Surely making it
mandatory for all free() calls to hold the GIL is easier than making the
allocator deal with the one case where this isn't done.

I like the GIL :-) so much so I'd like to see it visible at the Python
level. Then you could write your own atomic methods in Python.

BTW, if what Evan is hoping for concurrent threads running on different
processors in a multiprocessor system, then don't :-)

It's been a while since I looked at multiprocessor architectures, but I
believe threading's shared memory paradigm will always be hard to
distribute efficiently over multiple CPU's. If you want to run on
multiple processors, use processes, not threads.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6

2005-01-26 Thread Donovan Baarda

On Wed, 2005-01-26 at 01:53 +1100, Anthony Baxter wrote:
> On Wednesday 26 January 2005 01:01, Donovan Baarda wrote:
> > In this case it turns out to be "don't do exec() in a thread, because what
> > you exec can have all it's signals masked". That turns out to be a hell of
> > a lot of things; popen, os.command, etc. They all only work OK in a
> > threaded application if what you are exec'ing doesn't use any signals.
> 
> Yep. You just have to be aware of it. We do a bit of this at work, and we
> either spool via a database table, or a directory full of spool files. 
> 
> > Actually, I've noticed that zope often has a sorta zombie "which" process
> > which it spawns. I wonder it this is a stuck thread waiting for some
> > signal...
> 
> Quite likely.

For the record, it seems that the java version also contributes. This
problem only occurs when you have the following combination;

Linux >=2.6
Python <=2.3
j2re1.4 =1.4.2.01-1 | kaffe 2:1.1.4xxx

If you use Linux 2.4, it goes away. If you use Python 2.4 it goes away.
If you use j2re1.4.1.01-1 it goes away.

For the problem to occur the following combination needs to occur;

1) Linux uses the thread's sigmask instead of the main thread/process
sigmask for the exc'ed process (ie, 2.6 does this, 2.4 doesn't).

2) Python needs to screw with the sigmask in threads (python 2.3 does,
python 2.4 doesn't).

3) The exec'ed process needs to rely on threads (j2re1.4 1.4.2.01-1
does, j2re1.4 1.4.1.01-1 doesn't).

It is hard to find old Debian deb's of j2re1.4 (1.4.1.01-1), and when
you do, you will also need the now non-existent j2se-common 1.1 package.
I don't know if this qualifies as a potential bug against j2re1.4
1.4.2.01-1.

For now my solution is to roll back to the older j2re1.4.

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6

2005-01-25 Thread Donovan Baarda

G'day,

From: "Anthony Baxter" <[EMAIL PROTECTED]>
> On Thursday 20 January 2005 12:43, Donovan Baarda wrote:
> > On Wed, 2005-01-19 at 13:37 +, Michael Hudson wrote:
> > > The main oddness about python threads (before 2.3) is that they run
> > > with all signals masked.  You could play with a C wrapper (call
> > > setprocmask, then exec fop) to see if this is what is causing the
> > > problem.  But please try 2.4.
> >
> > Python 2.4 does indeed fix the problem. Unfortunately we are using Zope
> > 2.7.4, and I'm a bit wary of attempting to migrate it all from 2.3 to
> > 2.4. Is there any wa  this "Fix" can be back-ported to 2.3?
>
> It's extremely unlikely - I couldn't make myself comfortable with it
> when attempting to figure out it's backportedness. While the current
> behaviour on 2.3.4 is broken in some cases, I fear very much that
> the new behaviour will break other (working) code - and this is
> something I try very hard to avoid in a bugfix release, particularly
> in one that's probably the final one of a series.
>
> Fundamentally, the answer is "don't do signals+threads, you will
> get burned". For your application, you might want to instead try

In this case it turns out to be "don't do exec() in a thread, because what
you exec can have all it's signals masked". That turns out to be a hell of a
lot of things; popen, os.command, etc. They all only work OK in a threaded
application if what you are exec'ing doesn't use any signals.

> something where you write requests to a file in a spool directory,
> and have a python script that loops looking for requests, and
> generates responses. This is likely to be much simpler to debug
> and work with.

Hmm, interprocess communications; great fun :-) And no spawning the process
from within the zope application; it's gotta be a separate daemon.

Actually, I've noticed that zope often has a sorta zombie "which" process
which it spawns. I wonder it this is a stuck thread waiting for some
signal...


Donovan Baardahttp://minkirri.apana.org.au/~abo/


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6

2005-01-20 Thread Donovan Baarda

On Thu, 2005-01-20 at 14:12 +, Michael Hudson wrote:
> Donovan Baarda <[EMAIL PROTECTED]> writes:
> 
> > On Wed, 2005-01-19 at 13:37 +, Michael Hudson wrote:
> >> Donovan Baarda <[EMAIL PROTECTED]> writes:
[...]
> >> The main oddness about python threads (before 2.3) is that they run
> >> with all signals masked.  You could play with a C wrapper (call
> >> setprocmask, then exec fop) to see if this is what is causing the
> >> problem.  But please try 2.4.
> >
> > Python 2.4 does indeed fix the problem. 
> 
> That's good to hear.
[...]

I still don't understand what Linux 2.4 vs Linux 2.6 had to do with it.
Reading the man pages for execve(), pthread_sigmask() and sigprocmask(),
I can see some ambiguities, but mostly only if you do things they warn
against (ie, use sigprocmask() instead of pthread_sigmask() in a
multi-threaded app).

The man page for execve() says that the new process will inherit the
"Process signal mask (see sigprocmask() )". This implies to me it will
inherit the mask from the main process, not the thread's signal mask.

It looks like Linux 2.4 uses the signal mask of the main thread or
process for the execve(), whereas Linux 2.6 uses the thread's signal
mask. Given that execve() replaces the whole process, including all
threads, I dunno if using the thread's mask is right. Could this be a
Linux 2.6 kernel bug?

> > I'm not sure what the correct behaviour should be. The fact that it
> > works in python2.4 feels more like a byproduct of the thread mask change
> > than correct behaviour. 
> 
> Well, getting rid of the thread mask changes was one of the goals of
> the change.

I gathered that... which kinda means the fact that it fixed execvp in
threads is a side effect...(though I also guess it fixed a lot of other
things like this too).

> > To me it seems like execvp() should be setting the signal mask back
> > to defaults or at least the mask of the main process before doing
> > the exec.
> 
> Possibly.  I think the 2.4 change -- not fiddling the process mask at
> all -- is the Right Thing, but that doesn't help 2.3 users.  This has
> all been discussed before at some length, on python-dev and in various
> bug reports on SF.

Would a simple bug-fix for 2.3 be to have os.execvp() set the mask to
something sane before executing C execvp()? Given that Python does not
have any visibility of the procmask...

This might be a good idea regardless as it will protect against this bug
resurfacing in the future if someone decides fiddling with the mask for
threads is a good idea again.

> In your situation, I think the simplest thing you can do is dig out an
> old patch of mine that exposes sigprocmask + co to Python and either
> make a custom Python incorporating the patch and use that, or put the
> code from the patch into an extension module.  Then before execing
> fop, use the new code to set the signal mask to something sane.  Not
> pretty, particularly, but it should work.

The extension module that exposes sigprocmask() is probably best for
now...

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Strange segfault in Python threads and linux kernel 2.6

2005-01-19 Thread Donovan Baarda

On Wed, 2005-01-19 at 13:37 +, Michael Hudson wrote:
> Donovan Baarda <[EMAIL PROTECTED]> writes:
[...]
> You've left out a very important piece of information: which version
> of Python you are using.  I'm guessing 2.3.4.  Can you try 2.4?

Debian Python2.3 (2.3.4-18), Debian kernel-image-2.6.8-1-686 (2.6.8-10),
and Debian kernel-image-2.4.27-1-686 (2.4.27-6)

> I'd be astonished if this is the same bug.
> 
> The main oddness about python threads (before 2.3) is that they run
> with all signals masked.  You could play with a C wrapper (call
> setprocmask, then exec fop) to see if this is what is causing the
> problem.  But please try 2.4.

Python 2.4 does indeed fix the problem. Unfortunately we are using Zope
2.7.4, and I'm a bit wary of attempting to migrate it all from 2.3 to
2.4. Is there any way this "Fix" can be back-ported to 2.3?

Note that this problem is being triggered when using 
Popen3() in a thread. Popen3() simply uses os.fork() and os.execvp().
The segfault is occurring in the excecvp'ed process. I'm sure there must
be plenty of cases where this could happen. I think most people manage
to avoid it because the processes they are popen'ing or exec'ing happen
to not use signals.

After testing a bit, it seems the fork() in Popen3 is not a contributing
factor. The problem occurs whenever os.execvp() is executed in a thread.
It looks like the exec'ed command inherits the masked signals from the
thread.

I'm not sure what the correct behaviour should be. The fact that it
works in python2.4 feels more like a byproduct of the thread mask change
than correct behaviour. To me it seems like execvp() should be setting
the signal mask back to defaults or at least the mask of the main
process before doing the exec.

> > BTW, built in file objects really could use better non-blocking
> > support... I've got a half-drafted PEP for it... anyone interested in
> > it?
> 
> Err, this probably should be in a different mail :)

The verboseness of the attached test code because of this issue prompted
that comment... so vaguely related :-)

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Strange segfault in Python threads and linux kernel 2.6

2005-01-18 Thread Donovan Baarda

G'day,

I've Cc'ed this to zope-coders as it might affect other Zope developers
and it had me stumped for ages. I couldn't find anything on it anywhere,
so I figured it would be good to get something into google :-).

We are developing a Zope2.7 application on Debian GNU/Linux that is
using fop to generate pdf's from xml-fo data. fop is a java thing, and
we are using popen2.Popen3(), non-blocking mode, and select loop to
write/read stdin/stdout/stderr. This was all working fine.

Then over the Christmas chaos, various things on my development system
were apt-get updated, and I noticed that java/fop had started
segfaulting. I tried running fop with the exact same input data from the
command line; it worked. I wrote a python script that invoked fop in
exactly the same way as we were invoking it inside zope; it worked. It
only segfaulted when invoked inside Zope.

I googled and tried everything... switched from j2re1.4 to kaffe, rolled
back to a previous version of python, re-built Zope, upgraded Zope from
2.7.2 to 2.7.4, nothing helped. Then I went back from a linux 2.6.8
kernel to a 2.4.27 kernel; it worked!

After googling around, I found references to recent attempts to resolve
some signal handling problems in Python threads. There was one post that
mentioned subtle differences between how Linux 2.4 and Linux 2.6 did
signals to threads.

So it seems this is a problem with Python threads and Linux kernel 2.6.
The attached program demonstrates that it has nothing to do with Zope.
Using it to run "fop-test /usr/bin/fop http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=971213>.
Is this the same bug? Should I submit a new bug report? Is there any other way
I can help resolve this?

BTW, built in file objects really could use better non-blocking
support... I've got a half-drafted PEP for it... anyone interested in
it?

--
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

test-fop.py
Description: application/python
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

82 matches

Mail list logo