Re: [Python-Dev] microthreading vs. async io

2007-02-26 Thread Joachim König-Baltes
Adam Olsen wrote:
> That would depend on whether Joachim's wait() refers to the individual
> tasks' calls or the scheduler's call.  I assumed it referred to the
> scheduler.  In the basic form it would literally be select.select(),
> which has O(n) cost and often fairly large n.
The wait(events, timeout) call of a task would only mention the events
that the task is interested in. The wait() call yields that list to the 
scheduler.

The scheduler then analyzes the list of events that tasks are waiting for
and compares it to it's last call to select/poll/kevent and continues
tasks in a round robin fashion until all events have been scheduled to
the waiting tasks. Only when the scheduler has no events to deliver
(e.g. all tasks are waiting) a new select/poll/kevent OS call is made
by the scheduler, with a computed timeout to the lowest timeout value
of all the tasks, so that a timeout can be delivered at the right time.

Joachim
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-26 Thread Adam Olsen
On 2/25/07, Armin Rigo <[EMAIL PROTECTED]> wrote:
> Hi Adam,
>
> On Thu, Feb 15, 2007 at 06:17:03AM -0700, Adam Olsen wrote:
> > > E.g. have a wait(events = [], timeout = -1) method would be sufficient
> > > for most cases, where an event would specify
> >
> > I agree with everything except this.  A simple function call would
> > have O(n) cost, thus being unacceptable for servers with many open
> > connections.  Instead you need it to maintain a set of events and let
> > you add or remove from that set as needed.
>
> I just realized that this is not really true in the present context.
> If the goal is to support programs that "look like" they are
> multi-threaded, i.e. don't use callbacks, as I think is Joachim's goal,
> then most of the time the wait() function would be only called with a
> *single* event, rarely two or three, never more.  Indeed, in this model
> a large server is implemented with many microthreads: at least one per
> client.  Each of them blocks in a separate call to wait().  In each such
> call, only the events revelant to that client are mentioned.
>
> In other words, the cost is O(n), but n is typically 1 or 2.  It is not
> the total number of events that the whole application is currently
> waiting on.  Indeed, the scheduler code doing the real OS call (e.g. to
> select()) can collect the events in internal dictionaries, or in Poll
> objects, or whatever, and update these dictionaries or Poll objects with
> the 1 or 2 new events that a call to wait() introduces.  In this
> respect, the act of *calling* wait() already means "add these events to
> the set of all events that need waiting for", without the need for a
> separate API for doing that.

That would depend on whether Joachim's wait() refers to the individual
tasks' calls or the scheduler's call.  I assumed it referred to the
scheduler.  In the basic form it would literally be select.select(),
which has O(n) cost and often fairly large n.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-26 Thread Joachim König-Baltes
Armin Rigo wrote:
> I just realized that this is not really true in the present context.
> If the goal is to support programs that "look like" they are
> multi-threaded, i.e. don't use callbacks, as I think is Joachim's goal,
> then most of the time the wait() function would be only called with a
> *single* event, rarely two or three, never more.  Indeed, in this model
> a large server is implemented with many microthreads: at least one per
> client.  Each of them blocks in a separate call to wait().  In each such
> call, only the events revelant to that client are mentioned.
>   
Yes exactly.

> In other words, the cost is O(n), but n is typically 1 or 2.  It is not
> the total number of events that the whole application is currently
> waiting on.  Indeed, the scheduler code doing the real OS call (e.g. to
> select()) can collect the events in internal dictionaries, or in Poll
> objects, or whatever, and update these dictionaries or Poll objects with
> the 1 or 2 new events that a call to wait() introduces.  In this
> respect, the act of *calling* wait() already means "add these events to
> the set of all events that need waiting for", without the need for a
> separate API for doing that.
>   
But as I'd like to make the event structure similar to the BSD-kevent 
structure, we could use
a flag in the event structure that tells the schedular to consider it 
only once or keep it in its
dictionary, than the task would not need to supply the event each time.

> [I have experimented myself with a greenlet-based system giving wrapper
> functions for os.read()/write() and socket.recv()/send(), and in this
> style of code we tend to simply spawn new greenlets all the time.  Each
> one looks like an infinite loop doing a single simple job: read some
> data, process it, write the result somewhere else, start again.  (The
> loops are not really infinite; e.g. if sockets are closed, an exception
> is generated, and it causes the greenlet to exit.)  So far I've managed
> to always wait on a *single* event in each greenlet, but sometimes it
> was a bit contrieved and being able to wait on 2-3 events would be
> handy.]
>   
I do not spawn new greenlets all the time. Instead, my tasks either 
wait() or call wrappers
for read/write/send/recv... that implicitely call wait(...) until enough 
data is available, and the
wait(...) does the yield to the scheduler that can either continue other 
tasks or call kevent/poll/select
if no task is runnable.

What I'd like to see in an API/library:

* a standard schedular that is easily extensible
* event structure/class  that's  easily extensible

E.g. I've extended the kevent structure for the scheduler to also 
include channels similar to
stackless. These are python only communication structures, so there is 
no OS support
for blocking on them, but the scheduler can decide if there is something 
available for a task
that waits on a channel, so the channels are checked first in the 
schedular to see if a task
can continue and only if no channel event is available the schedular 
calls kevent/select/poll.

While the scheluler blocks in kevent/select/poll, nothing happens on the 
channels as no
task is running, so the scheduler never blocks (inside the OS) 
unnecessarily.

Joachim








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-25 Thread Armin Rigo
Hi Adam,

On Thu, Feb 15, 2007 at 06:17:03AM -0700, Adam Olsen wrote:
> > E.g. have a wait(events = [], timeout = -1) method would be sufficient
> > for most cases, where an event would specify
> 
> I agree with everything except this.  A simple function call would
> have O(n) cost, thus being unacceptable for servers with many open
> connections.  Instead you need it to maintain a set of events and let
> you add or remove from that set as needed.

I just realized that this is not really true in the present context.
If the goal is to support programs that "look like" they are
multi-threaded, i.e. don't use callbacks, as I think is Joachim's goal,
then most of the time the wait() function would be only called with a
*single* event, rarely two or three, never more.  Indeed, in this model
a large server is implemented with many microthreads: at least one per
client.  Each of them blocks in a separate call to wait().  In each such
call, only the events revelant to that client are mentioned.

In other words, the cost is O(n), but n is typically 1 or 2.  It is not
the total number of events that the whole application is currently
waiting on.  Indeed, the scheduler code doing the real OS call (e.g. to
select()) can collect the events in internal dictionaries, or in Poll
objects, or whatever, and update these dictionaries or Poll objects with
the 1 or 2 new events that a call to wait() introduces.  In this
respect, the act of *calling* wait() already means "add these events to
the set of all events that need waiting for", without the need for a
separate API for doing that.

[Actually, I think that the simplicity of the wait(events=[]) interface
over any add/remove/callback APIs is an argument in favor of the
"microthread-looking" approach in general, though I know that it's a
very subjective topic.]

[I have experimented myself with a greenlet-based system giving wrapper
functions for os.read()/write() and socket.recv()/send(), and in this
style of code we tend to simply spawn new greenlets all the time.  Each
one looks like an infinite loop doing a single simple job: read some
data, process it, write the result somewhere else, start again.  (The
loops are not really infinite; e.g. if sockets are closed, an exception
is generated, and it causes the greenlet to exit.)  So far I've managed
to always wait on a *single* event in each greenlet, but sometimes it
was a bit contrieved and being able to wait on 2-3 events would be
handy.]


A bientot,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread dustin
On Thu, Feb 15, 2007 at 11:47:27AM -0500, Jean-Paul Calderone wrote:
> Is the only problem here that this style of development hasn't had been made
> visible enough?

Yep -- I looked pretty hard about two years ago, and although I haven't
been looking for that specifically since, I haven't heard anything about
it.

The API docs don't provide a good way to find things like this, and the
Twisted example tutorial didn't mention it at my last check.

So if we have an in-the-field implementation of this style of
programming (call it what you will), is it worth *standardizing* that
style so that it's the same in Twisted, my library, and anyone else's
library that cares to follow the standard?  

Dustin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen
On 2/15/07, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> Is the only problem here that this style of development hasn't had been made
> visible enough?

Perhaps not the only problem, but definitely a big part of it.  I
looked for such a thing in twisted after python 2.5 came out and was
unable to find it.  If I had I may not have bothered to update my own
microthreads to use python 2.5 (my proof-of-concept was based on real
threads).

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen
On 2/15/07, Joachim König-Baltes <[EMAIL PROTECTED]> wrote:
> Adam Olsen schrieb:
> > I don't think we're on the same page then.  The way I see it you want
> > a single async IO implementation shared by everything while having a
> > collection of event loops that cooperate "just enough".  The async IO
> > itself would likely end up being done in C.
> >
> No, I'd like to have:
>
> - An interface for a task to specifiy the events it's interested in, and
> waiting
>   for at least one of the events (with a timeout).
> - an interface for creating a task (similar to creating a thread)
> - an interface for a schedular to manage the tasks

My tasks are transient and only wait on one thing at a time (if not
waiting for the scheduler to let them run!).  I have my own semantics
for creating tasks that incorporate exception propagation.  My
exception propagation (and return handling) require a scheduler with
specific support for them.  Net result is that I'd have to wrap all
you provide, if not monkey-patching it because it doesn't provide the
interfaces to let me wrap it properly.

All I want is a global select.poll() object that all the event loops
can hook into and each will get a turn to run after each call.

Well, that, plus I want it to work around all the platform-specific
peculiarities.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Phillip J. Eby
At 11:47 AM 2/15/2007 -0500, Jean-Paul Calderone wrote:
>Is the only problem here that this style of development hasn't had been made
>visible enough?

You betcha.  I sure as heck wouldn't have bothered writing the module I 
did, if I'd known it already existed.  Or at least only written whatever 
parts that the module doesn't do.  The Twisted used by the current version 
of Chandler doesn't include this feature yet, though, AFAICT.

But this is excellent; it means people will be able to write plugins that 
do network I/O without needing to grok CPS.  They'll still need to be able 
to grok some basics (like not blocking the reactor), but this is good to 
know about.  Now I won't have to actually test that module I wrote.  ;-)

You guys should be trumpeting this - it's real news and in fact a motivator 
for people to upgrade to Python 2.5 and whatever version of Twisted 
supports this.  You just lifted a *major* barrier to using Twisted for 
client-oriented tasks, although I imagine there probably needs to be even 
more of it.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Jean-Paul Calderone
On Thu, 15 Feb 2007 10:36:21 -0600, [EMAIL PROTECTED] wrote:
> [snip]
>
>def fetchSequence(...):
>  fetcher = Fetcher()
>  yield fetcher.fetchHomepage()
>  firstData = yield fetcher.fetchPage('http://...')
>  if someCondition(firstData):
>while True:
>  secondData = yield fetcher.fetchPage('http://...')
>  # ...
>  if someOtherCondition(secondData): break
>  else:
># ...

Ahem:

from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks
from twisted.web.client importt getPage

@inlineCallbacks
def fetchSequence(...):
homepage = yield getPage(homepage)
firstData = yield getPage(anotherPage)
if someCondition(firstData):
while:
secondData = yield getPage(wherever)
if someOtherCondition(secondData):
break
else:
...

So as I pointed out in another message in this thread, for several years it
has been possible to do this with Twisted.  Since Python 2.5, you can do it
exactly as I have written above, which looks exactly the same as your example
code.

Is the only problem here that this style of development hasn't had been made
visible enough?

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread dustin
On Thu, Feb 15, 2007 at 04:51:30PM +0100, Joachim K?nig-Baltes wrote:
> The style used in asyncore, inheriting from a class and calling return 
> in a method
> and being called later at a different location (different method) just 
> interrupts the
> sequential flow of operations and makes it harder to understand. The same is
> true for all other strategies using callbacks or similar mechanisms.
> 
> All this can be achieved with a multilevel yield() that is hidden in a 
> function call.
> So the task does a small step down (wait) in order to jump up (yield) to 
> the scheduler
> without disturbing the eye of the beholder.

I agree -- I find that writing continuations or using asyncore's
structure makes spaghetti out of functionality that requires multiple
blocking operations inside looping or conditional statements.  The best
example, for me, was writing a complex site-specific web spider that had
to fetch 5-10 pages in a certain sequence, where each step in that
sequence depended on the results of the previous fetches.  I wrote it in
Twisted, but the proliferation of nested callback functions and chained
deferreds made my head explode while trying to debug it.  With a decent
microthreading library, that could look like:

def fetchSequence(...):
  fetcher = Fetcher()
  yield fetcher.fetchHomepage()
  firstData = yield fetcher.fetchPage('http://...')
  if someCondition(firstData):
while True:
  secondData = yield fetcher.fetchPage('http://...')
  # ...
  if someOtherCondition(secondData): break
  else:
# ...

which is *much* easier to read and debug.  (FWIW, after I put my head
back together, I rewrote the app with threads, and it now looks like the
above, without the yields.  Problem is, throttlling on fetches means 99%
of my threads are blocked on sleep() at any given time, which is just
silly).

All that said, I continue to contend that the microthreading and async
IO operations are separate.  The above could be implemented relatively
easily in Twisted with a variant of the microthreading module Phillip
posted earlier.  It could also be implemented atop a bare-bones
microthreading module with Fetcher using asyncore on the backend, or
even scheduler urllib.urlopen() calls into OS threads.  Presumably, it
could run in NanoThreads and Kamaelia too, among others.

What I want is a consistent syntax for microthreaded code, so that I
could write my function once and run it in *all* of those circumstances.

Dustin

P.S. For the record -- I've written lots of other apps in Twisted with
great success; this one just wasn't a good fit.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Phillip J. Eby
At 11:00 PM 2/14/2007 -0600, [EMAIL PROTECTED] wrote:
>Instead, I would like to concentrate on producing a small, clean,
>consistent, generator-based microthreading library.  I've seen several
>such libraries (including the one in PEP 342, which is fairly skeletal),
>and they all work *almost* the same way, but differ in, for example, the
>kinds of values that can be yielded, their handling of nested calls, and
>the names for the various "special" values one can yield.

Which is one reason that any "standard" coroutine library would need to be 
extensible with respect to the handling of yielded values, e.g. by using a 
generic function to implement the yielding.  See the 'yield_to' function in 
the example I posted.

Actually, the example I posted would work fine as a microthreads core by 
adding a thread-local variable that points to some kind of "scheduler" to 
replace the Twisted scheduling functions I used.  It would need to be a 
variable, because applications would have to be able to replace it, e.g. 
with the Twisted reactor.

In other words, the code I posted isn't really depending on Twisted for 
anything but reactor.callLater() and the corresponding .isActive() and 
.cancel() methods of the objects it returns.  If you add a default 
implementation of those features that can be replaced with the Twisted 
reactor, and dropped the code that deals with Deferreds and TimeoutErrors, 
you'd have a nice standalone library.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes
Joachim König-Baltes wrote:
> The problem solved by this approach is to allow a number of cooperating
> threads to wait for an event without the need to busy loop or block by 
> delegating
> the waiting to a central instance, the scheduler. How this efficient 
> waiting is
> implemented is the responsability of the scheduler, but the schedular would
> not do the (possibly blocking) io operation, it would only guaranty to
> continue a task, when it can do an IO operation without blocking.
>
>   
 From the point of view of the task, it only has to sprinkle a number of 
wait(...) calls
before doing blocking calls, so there is no need to use callbacks or 
writing the
inherently sequential code upside down. That is the gain I'm interested in.

The style used in asyncore, inheriting from a class and calling return 
in a method
and being called later at a different location (different method) just 
interrupts the
sequential flow of operations and makes it harder to understand. The same is
true for all other strategies using callbacks or similar mechanisms.

All this can be achieved with a multilevel yield() that is hidden in a 
function call.
So the task does a small step down (wait) in order to jump up (yield) to 
the scheduler
without disturbing the eye of the beholder.

Joachim


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes
Adam Olsen schrieb:
> I don't think we're on the same page then.  The way I see it you want
> a single async IO implementation shared by everything while having a
> collection of event loops that cooperate "just enough".  The async IO
> itself would likely end up being done in C.
>
No, I'd like to have:

- An interface for a task to specifiy the events it's interested in, and 
waiting
  for at least one of the events (with a timeout).
- an interface for creating a task (similar to creating a thread)
- an interface for a schedular to manage the tasks

When a task resumes after a wait(...) it knows which of the events it was
waiting for have fired. It can then do whatever it wants, do the low 
level non
blocking IO on its own or using something else. Of course, as these are
cooperative tasks, they still must be careful not to block, e.g. not reading
to much from a file descriptor that is readable, but these problems have
been solved in a lot of libraries, and I would not urge the task to use a
specific way to accomplish its "task".

The problem solved by this approach is to allow a number of cooperating
threads to wait for an event without the need to busy loop or block by 
delegating
the waiting to a central instance, the scheduler. How this efficient 
waiting is
implemented is the responsability of the scheduler, but the schedular would
not do the (possibly blocking) io operation, it would only guaranty to
continue a task, when it can do an IO operation without blocking.

Joachim

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen
On 2/15/07, Joachim König-Baltes <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
> >> I have implemented something like the above, based on greenlets.
> >
> > I assume greenlets would be an internal implementation detail, not
> > exposed to the interface?
> Yes, you could use stackless, perhaps even Twisted,
> but I'm not sure if that would work because the requirement for the
> "reads single-threaded" is the simple wait(...) function call that does
> a yield
> (over multiple stack levels down to the function that created the task),
> something that is only provided by greenlet and stackless to my knowledge.

I don't think we're on the same page then.  The way I see it you want
a single async IO implementation shared by everything while having a
collection of event loops that cooperate "just enough".  The async IO
itself would likely end up being done in C.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes
Adam Olsen wrote:
> I agree with everything except this.  A simple function call would
> have O(n) cost, thus being unacceptable for servers with many open
> connections.  Instead you need it to maintain a set of events and let
> you add or remove from that set as needed.
We can learn from kevent here, it already has EV_ADD,
EV_DELETE,  EV_ENABLE, EV_DISABLE, EV_ONESHOT
flags. So the event-conditions would stay "in the scheduler" (per task)
so that they can fire multiple times without the need to be handled
over again and again.

Thanks, that's exactly the discussion I'd like to see, discussing about
a simple API.

>> I have implemented something like the above, based on greenlets.
>
> I assume greenlets would be an internal implementation detail, not
> exposed to the interface?
Yes, you could use stackless, perhaps even Twisted,
but I'm not sure if that would work because the requirement for the
"reads single-threaded" is the simple wait(...) function call that does 
a yield
(over multiple stack levels down to the function that created the task),
something that is only provided by greenlet and stackless to my knowledge.

Joachim

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen
On 2/15/07, Joachim König-Baltes <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> E.g. have a wait(events = [], timeout = -1) method would be sufficient
> for most cases, where an event would specify

I agree with everything except this.  A simple function call would
have O(n) cost, thus being unacceptable for servers with many open
connections.  Instead you need it to maintain a set of events and let
you add or remove from that set as needed.


> I have implemented something like the above, based on greenlets.

I assume greenlets would be an internal implementation detail, not
exposed to the interface?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes
[EMAIL PROTECTED] wrote:

[...]
> microtreading:
>   Exploiting language features to use cooperative multitasking in tasks
>   that "read" like they are single-threaded.
>
> asynchronous IO:
>   Performing IO to/from an application in such a way that the
>   application does not wait for any IO operations to complete, but
>   rather polls for or is notified of the readiness of any IO operations.
>
>   
[...]
> Asyncore *only* implements asynchronous IO -- any "tasks" performed in
> its context are the direct result of an IO operation, so it's hard to
> say it implements cooperative multitasking (and Josiah can correct me if
> I'm wrong, but I don't think it intends to).
>
> Much of the discussion here has been about creating a single, unified
> asynchronous IO mechanism that would support *any* kind of cooperative
> multitasking library.  I have opinions on this ($0.02 each, bulk
> discounts available), but I'll keep them to myself for now.
>   
Talking only about async I/O in order to write cooperative tasks that 
"smell" single threaded is to
restricted IMO.

If there are a number of cooperative tasks that "read" single-threaded 
(or sequential) than the goal
is to avoid a _blocking operation_ in any of them because the other 
tasks could do useful things
in the meantime.

But there are a number of different blocking operations, not only async 
IO (which is easily
handled by select()) but also:

- waiting for a child process to exit
- waiting for a posix thread to join()
- waiting for a signal/timer
- ...

Kevent (kernel event) on BSD e.g. tries to provide a common 
infrastructure to provide a file descriptor
where one can push some conditions onto and select() until one of the 
conditions is met. Unfortunately,
thread joining is not covered by it, so one cannot wait (without some 
form of busy looping) until one
of the conditions is true if thread joining is one of them, but for all 
the other cases it would be possible.

There are many other similar approaches (libevent, notify, to name a few).

So in order to avoid blocking in a task, I'd prefer that the task:

- declaratively specifies what kind of conditions (events) it wants to 
wait for. (API)

If that declaration is a function call, then this function could 
implicitely yield if the underlying implementation
would be stackless or greenlet based.

Kevent on BSD systems already has a usable API for defining the 
conditions by structures and there is
also a python module for it.

The important point IMO is to have an agreed API for declaring the 
conditions a task wants to wait for.
The underlying implementation in a scheduler would be free to use 
whatever event library it wants to
use.

E.g. have a wait(events = [], timeout = -1) method would be sufficient 
for most cases, where an event would specify

- resource type (file, process, timer, signal, ...)
- resource id (fd, process id, timer id, signal number, ...)
- filter/flags (when to fire, e.g. writable, readable exception for fd, ...)
- ...

the result could be a list of events that have "fired", more or less 
similar to the events in the argument list,
but with added information on the exact condition.

The task would return from wait(events) when at least 1 of the 
conditions is met. The task then knows e.g.
that an fd is readable and can then do the read() on its own in the way 
it likes to do it, without being forced
to let some uber framework do the low level IO. Just the waiting for 
conditions without blocking the
application is important.

I have implemented something like the above, based on greenlets.

In addition to the event types specified by BSD kevent(2) I've added a 
TASK and CHANNEL resource type
for the events, so that I can wait for tasks to complete or send/receive 
messages to/from other tasks without
blocking the application.

But the implementation is not the important thing, the API is, and then 
we can start writing competing implementations.

Joachim






   





















___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Josiah Carlson

"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> 
> [EMAIL PROTECTED] schrieb:
> > Asyncore *only* implements asynchronous IO -- any "tasks" performed in
> > its context are the direct result of an IO operation, so it's hard to
> > say it implements cooperative multitasking (and Josiah can correct me if
> > I'm wrong, but I don't think it intends to).
> 
> I'm trying to correct you: By your definition, asyncore implements
> cooperative multi-tasking. You didn't define 'task', but I understand
> it as 'separate activity'. With asyncore, you can, for example, run
> a web server and an IRC server in a single thread, as separate tasks,
> and asyncore deals with the scheduling between these tasks.
> In your terminology, it is based on continuations: the chunk you specify
> is the event handler.
> 
> Indeed, asyncore's doc string starts with
> 
> # There are only two ways to have a program on a single
> # processor do "more than one thing at a time".
> 
> and goes on suggesting that asyncore provides one of them.

Well, when asyncore was written, it probably didn't have coroutines,
generator trampolines, etc., what we would consider today, in this
particular context, "cooperative multithreading".

What that documentation /should/ have said is...

# There are at least two ways to have a program on a single
# processor do "more than one thing at a time".

Then go on to describe threads, a 'polling for events' approach like
asyncore, then left the rest for someone else to add later.  I'll add it
as my first patch to asyncore.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] microthreading vs. async io

2007-02-14 Thread Martin v. Löwis
[EMAIL PROTECTED] schrieb:
> Asyncore *only* implements asynchronous IO -- any "tasks" performed in
> its context are the direct result of an IO operation, so it's hard to
> say it implements cooperative multitasking (and Josiah can correct me if
> I'm wrong, but I don't think it intends to).

I'm trying to correct you: By your definition, asyncore implements
cooperative multi-tasking. You didn't define 'task', but I understand
it as 'separate activity'. With asyncore, you can, for example, run
a web server and an IRC server in a single thread, as separate tasks,
and asyncore deals with the scheduling between these tasks.
In your terminology, it is based on continuations: the chunk you specify
is the event handler.

Indeed, asyncore's doc string starts with

# There are only two ways to have a program on a single
# processor do "more than one thing at a time".

and goes on suggesting that asyncore provides one of them.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] microthreading vs. async io

2007-02-14 Thread dustin
I've steered clear of this conversation for a while, because it drifted
a little bit off my original topic.  But I did want to straighten a
little bit of terminology out, if only to resolve some of my own
confusion over all the hubbub.  I don't pretend to define the words
others use; these definitions are mine, and apply to what I write below.

cooperative multitasking:
  Dynamically executing multiple tasks in a single thread; comes in
  two varieties:

continuations:
  Breaking tasks into discrete "chunks", and passing references those
  chunks around as a means of scheduling.

microtreading:
  Exploiting language features to use cooperative multitasking in tasks
  that "read" like they are single-threaded.

asynchronous IO:
  Performing IO to/from an application in such a way that the
  application does not wait for any IO operations to complete, but
  rather polls for or is notified of the readiness of any IO operations.

Twisted is, by the above definitions, a continuation-based cooperative
multitasking library that includes extensive support for asynchronous
IO, all the way up the network stack for an impressive array of
protocols.  It does not itself implement microthreading, but Phillip
provided a nice implementation of such on top of Twisted[1].

Asyncore *only* implements asynchronous IO -- any "tasks" performed in
its context are the direct result of an IO operation, so it's hard to
say it implements cooperative multitasking (and Josiah can correct me if
I'm wrong, but I don't think it intends to).

Much of the discussion here has been about creating a single, unified
asynchronous IO mechanism that would support *any* kind of cooperative
multitasking library.  I have opinions on this ($0.02 each, bulk
discounts available), but I'll keep them to myself for now.

Instead, I would like to concentrate on producing a small, clean,
consistent, generator-based microthreading library.  I've seen several
such libraries (including the one in PEP 342, which is fairly skeletal),
and they all work *almost* the same way, but differ in, for example, the
kinds of values that can be yielded, their handling of nested calls, and
the names for the various "special" values one can yield.  

That similar mouldes are being written repeatedly, and presumably
applications and frameworks are being built on top of those modules,
seems to me to suggest a new "standard" implementation should be added
to the standard library.

I realize that I'm all talk and no code -- I've been busy, but I hope to
rectify the imbalance tonight.

Dustin

[1] http://mail.python.org/pipermail/python-dev/2007-February/071076.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com