[sage-devel] Re: notebook rewrite

Dorian Raymer Wed, 22 Jul 2009 20:00:48 -0700

Hi Glenn,

On Wed, Jul 22, 2009 at 4:41 PM, ghtdak <gl...@tarbox.org> wrote:


>
>
> >
> > > My primary problem is that the Sage subprocess is blocking forever on
> > > the other side of the pipe when its not computing... Therefore, I
> > > can't have a Sage sub-process that I'm using in the notebook that is
> > > also able to communicate with other processes as I can't
> > > asynchronously receive data (or get timing interrupts).  I've gotten
> > > around this in the past by using threads as it was the only choice I
> > > had.
> >
> > Thanks for the clarification.  Since I don't really understand the
> > problem, without further clarification I don't think it will get fixed
> > in the near future.
>
> Basically, the problem is that the Sage sub-process loses control when
> its done servicing a request from the server.  Instead of entering an
> event loop, it blocks on the pipe.
>
> The alternative would be to return after any request to an event
> loop.  Clearly, the primary requester would be the notebook server,
> but if you had a general event loop, the user could register any
> number of other asynchronous sources or timers to respond to.
>
> The adjustment I propose is in the sub-process (and probably would
> extend this to the notebook but I'll withhold that for the time
> being).  Instead of initializing and entering the read-process-write
> cycle, where read blocks waiting for a request, the sub-process would
> initialize, register a callback to handle requests from the server
> pipe, and enter the event loop.  When a request came in, the primary
> sage callback would "do the right thing" and return the result, hence
> returning to the event loop.
>
> Now, if a user wanted to handle other events or timers, they would
> simply add those asynchronous systems to the event loop using whatever
> means were provided.  This is, essentially, the purpose of the Twisted
> Reactor.
>
> This sub-process adjustment could (maybe) be done initially without
> changing anything on the notebook itself.  The pipe callback would
> receive the string, call a slightly modified sub-process method, get
> the result and return it.  Even if there are pexpect elements in the
> sub-process, the callback could feed that.
>
> Of course, once you put an event loop in the sub-process, much of
> pexpect probably becomes unnecessary.  Seems to me that Twisted could
> do the spawning of the twisted sub-process and just shuffle strings
> across.  I don't really know pexpect but my guess is the things it
> handles are also handled by twisted (although I don't actually know).
> Its doubtful that there is sufficient complexity that pickle couldn't
> handle the marshalling of data across the pipe.
>

The essence of what you have described is architecture of Codenode. With out
getting too much into a side by side comparison, this is how it works using
our terminology:

An Engine is a computation process (like what you call a Sage sub-process).
An Engine contains two parts: First part, an object representing the Python
(or Sage) 
interpreter<http://github.com/codenode/codenode/blob/8747088439efce423cefd680eab2daf625ff1e5d/codenode/backend/kernel/engine/python/interpreter.py>,
with methods like evaluate, and tab complete. Second part, a simple RPC
server<http://github.com/codenode/codenode/blob/8747088439efce423cefd680eab2daf625ff1e5d/codenode/backend/kernel/engine/server.py>that
adapts the interpreter object to a network transport; the current
working implementation is an XML-RPC server.

The Interpreter object knows nothing about the RPC server, and the RPC
server only has to map it's rpc methods to the interpreter object methods.

A Backend server creates and manages Engines.
The Backend server is a Twisted event loop.
It spawns, monitors, and terminates Engine Processes.
It relays all notebook AJAX requests to the appropriate engine via an
XML-RPC client (which is non-blocking).

Since the Backend is a Twisted event loop, it handles all process
spawning/monitoring, AJAX requests, and XML-RPC requests totally
asynchronously.

It has no dependencies on the interpreter libraries that it runs, and does
not need to understand the data going in and out of the engines.

Have a look at all the pieces currently work
here<http://github.com/codenode/codenode/tree/8747088439efce423cefd680eab2daf625ff1e5d/codenode/backend/kernel>
.
I am currently improving this Backend concept and have a completely new
iteration of 
code<http://github.com/deldotdr/codenode/tree/8b7658ebc51e7ab005e950f373a129d15cfea504/codenode/backend>(still
in development) that should be a bit more digestible (it's cleaner,
among other improvements).


>
> Once this change was made, you'd have a full infrastructure with which
> to build much more flexible applications yet still have the  notebook
> interface.  This would also facilitate building distributed
> computation engines, data collectors etc.
>

Yes! Exactly. Combined with the third element (called the Frontend web
application server who's most important job is storing users notebook data
in a database), you have distributed notebook computation. We have
successfully done deployments where the Frontend runs on one server, and the
Backend on another server, usually in another City!
When the next piece of development is finished, a Frontend will be capable
of supporting multiple Backends, again, no matter where they are on the
network.


>
> Furthermore, as things evolved, truly dynamic AJAX could be built
> because the underlying Sage process could be asynchronously receiving
> data, talking with other Sage processes, periodically polling other
> servers (e.g. yahoo finance)
>

This is interesting, and sounds like something different than just a
"Computation Engine".
I interpret this as a formalization of some kind of information base into a
service accessible by the the notebook Engine processes.

The simple limit of this is the formalization of how a processes OS
Environment Variables are set. For example, I could want to set up a
environment supporting scientific analysis, so I would want to configure
numpy and scipy to be in my Engine process python path.
This environment could also facilitate access to some kind of data set, like
one produced by an Astrophysicist's galaxy simulation. The data set could be
a file, a set of files, etc. Or, in the large limit, and coming back to your
example, it could be a network service explicitly supported by the Backend,
with accessibility permissions from each engine process explicitly
configurable.

I'm not sure it makes sense, architecturally, for an Engine process to run
an event loop, because it's principal purpose is to execute arbitrary code,
and that must be generally assumed to be a blocking procedure. Maybe you can
elaborate more on this idea; it's very interesting.

Take a look at codenode <http://github.com/codenode/codenode/tree/master>!

-Dorian


>
> It seems this would be a fairly easy task and have substantial
> payoff.  I would have done it myself, but got bogged down when looking
> through the pexpect code and the strings shoved over the pipe.  I
> figured that a few hours with the author would save a lot of time but
> the runaway stack interfered.
>
> -glenn
>
> >
> > William
> >
> >
> >
> >
> >
> > > -glenn
> >
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: notebook rewrite

Reply via email to