On Sun, Aug 23, 2009 at 9:17 AM, Brian Granger <ellisonbg....@gmail.com> wrote:
>
>> In the current architecture, a twistd daemon spawns a notebook server
>> which is responsible for doing "sage" stuff.  twistd is fully
>> asynchronous, but the notebook process itself is a pexpect based
>> blocking process connected with pipes to twistd.  As such, the block
>> on read by pexpect precludes the sage process servicing asynchronous
>> events.
>>
>> IMHO, this architecture is incorrect and limited... Perhaps this is
>> part of what is being rethought... if not, I believe it should be.
>
> As an avid Twisted user, I too thought this initially (why use pexpect, when 
> you could use Twisted).  But after looking at this issue further, I think 
> using pexpect is not that bad.  Here is why:
>
> 1.  If you were to use Twisted, while the process was running user's code, 
> Twisted would still block.  Using threads (running the Twisted event loop in 
> a thread) only partially solves this problem as the python intepreter can't 
> switch threads while no GIL-releasing C/C++ code is running.  We ran into 
> this in early versions of IPython's parallel stuff - it worked great (asynch) 
> until the second we went to do something like diagonalize a matrix using 
> scipy.  Then everything would block.  We have had to work very hard to get 
> around this GIL induced limitation of using Twisted.
>
> 2.  Both dsage and parallel ipython clients use Twisted.  For this to work, 
> these clients need to run the Twisted reactor in a different thread than user 
> code is executed.  Currently, these work fine in the notebook, because they 
> can start the reactor in this way by themselves.  If the notebook itself used 
> Twisted, great care would need to be used to make sure these things still 
> worked.  You would have to run user code in the main thread and run all the 
> twisted stuff in a different thread.  User code needs to be in the main 
> thread if you want users to be able to run real GUI code (I do this 
> sometimes!).

The Sage notebook is a lot like the command line tools bash or screen
or even ssh.  The pexpect library is just a collection of Python
bindings to pseudotty that make it easy for one process to spawn and
run subprocesses.

Moreover, as long as the worksheet  and the notebook server are
distinct processes (as they should be, IMHO), the difference between
using pexpect, or xmlrpc, or anything else, for them to communicate is
completely and totally irrelevant, since it is a black box to the
entire rest of the program.

Also, to correct another possible misconception, communication between
a processes and a subprocess using pexpect is not blocking.  The
master processes can listen for however long it wants to the
subprocess, then stop listening.   That's why when you do

for i in range(10):
     sleep(1)
     print(i)

in the Sage notebook, you see the output as it is computed.  The
notebook server just uses pexpect to "peak" at the output of the
subprocess doing the actual work and look to see what has been output
so far.

Another misconception is that pexpect is restricted to local
processes.  It's easy to control a process via pexpect over the
network via ssh.   This has been in Sage since 2005, and can already
be used for worksheet subprocesses *now* as long as you have a shared
filesystem (just use the server_pool option).  Here is an example on
the command line.  I have ssh keys setup so I can do "ssh
sage.math.washington.edu" and login without typing a password.   I
start Sage on my laptop in a coffee shop, and make a connection to a
remote Sage that gets started running on sage.math, and I run a
calculation.

flat:sageuse wstein$ sage
----------------------------------------------------------------------
| Sage Version 4.1.1, Release Date: 2009-08-14                       |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
sage: s = Sage(server="sage.math.washington.edu")
No remote temporary directory (option server_tmpdir) specified, using
/tmp/ on sage.math.washington.edu
sage: s.eval("2+2")
'4'
sage: s.eval("os.system('uname -a')")
'Linux sage.math.washington.edu 2.6.24-23-server #1 SMP Wed Apr 1
22:14:30 UTC 2009 x86_64 GNU/Linux\n0'
sage:


The above used pexpect.  You can even interact with remote objects:

sage: e = s("EllipticCurve([1..5])")
sage: e.rank()
1

You can do the same with Mathematica, etc. by the way:

sage: s = Mathematica(server="sage.math.washington.edu")
sage: s("Factorial[50]")
30414093201713378043612608166064768844377641568960512000000000000


Compare my laptop to sage.math's mathematica:

sage: s("Timing[Factorial[10^6]][[1]]")                     # sage.math
1.1099999999999999
sage: mathematica("Timing[Factorial[10^6]][[1]]")  # laptop
0.8902620000000001

(I guess Mathematica 7.0 is faster at factorials than Mathematica 6.0.)

This tests latency:

sage: timeit('s.eval("2+2")')        # over web via ssh
5 loops, best of 3: 56.3 ms per loop
sage: timeit('mathematica.eval("2+2")')  # local
625 loops, best of 3: 209 µs per loop

Of course latency is long over the net, since I'm in a random coffee shop.

This remote server stuff has been in sage since 2005, and hasn't been
changed in the slightest bit since then.  That's why I'm advertising
it now, since it would be cool to see some people work on it and
improve it.  For example, for people without ssh keys, one could
*easily* make it so the following works:

sage: s = Mathematica(server="sage.math.washington.edu")
password: xxx

sage: s = Mathematica(server="w...@sage.math.washington.edu")
password: xxx

Scripted logins via pexpect are in fact the raison d'etre for pexpect
in the first place, and would be easy to add.   There are also bound
to be all kinds of subtle issues with server=... that haven't been
found due to lack of use.  A good test would be to try to force the
gap or maxima interfaces to run 100% remotely (by editing
interfaces/gap.py or interface/maxima.py), then try to run the Sage
test suite and see what goes wrong.

With respect to the notebook, there is currently some reliance on a
shared filesystem for the worksheet processes.  This could be I think
easily fixed via some slight redesign, and I'll do this in October.
I could even make it so that there is an option for a given worksheet
(set in say a worksheet configuration pane) for that worksheet to run
as a given user on a given remote system.  Then whenever you use that
worksheet, you would have to login to the remote system to start it
running, and afterwards all computations would happen using the
default "sage" command on that remote system over ssh.    I think
implementing this would be completely straightforward given the
current notebook design, and already this would provide a level of
flexibility and power that rivals anything the codenode design or
anybody else has suggested.     In case the above wasn't clear, one
could go to say https://sagenb.org, login, but then have persistent
worksheet processes that run on sage.math.washington.edu, or any other
powerful specific computer you have an account on.  This would give
you access to your own build of Sage, commercial software on that
machine, etc.

So there is still some potential to the pseudotty approach to
controlling processes.     The main drawback in my mind is that it
works differently (and maybe not so well) on Windows (though it does
actually work, but via the "Console API").

 -- William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to