[sage-devel] Re: notebook rewrite

William Stein Sun, 11 Oct 2009 02:38:41 -0700

On Wed, Aug 19, 2009 at 1:19 AM, Robert Bradshaw
<rober...@math.washington.edu> wrote:
>
> On Jul 22, 2009, at 11:12 AM, ghtdak wrote:
>
>> On Jul 21, 6:40 pm, "Dr. David Kirkby" <david.kir...@onetel.net>
>> wrote:
>>> ghtdak wrote:
>>>
>>>> This thread has gotten long and there are many subjects embedded
>>>> within.
>>>
>>>> One of the problems I've had with the notebook implementation is
>>>> that
>>>> the sage process supporting the notebook computation blocks on the
>>>> pipe between itself and the twistd server which spawns it.  This
>>>> means
>>>> that one can't build an asynchronous event handler without using
>>>> threads.
>>>
>>>> More conventional web server architectures use a callback style for
>>>> invocation making it much easier to splice in other events for
>>>> handling by the main thread (this is the general asynchronous
>>>> programming model and the heart of how Twisted works)
>>>
>>>> Perhaps it is a foregone conclusion that this approach will be taken
>>>> in the rewrite.  if not, I'd like to suggest that it is an important
>>>> consideration.
>>>
>>>> -glenn
>>>
>>> I don't know about how the web server is implemented. I know it
>>> did not
>>> work on my Solaris box,  but that  is another matter.
>>>
>>> But actually including Apache might be  a sensible choice. A lot of
>>> people know  how to administer Apache. It offers a  lot of
>>> flexibility.
>>> You can for example only serve pages to particular IP addresses.
>>>
>>> Worth a  thought anyway.
>>
>> I came across this post from the Twisted folks. It looks like they do
>> WSGI and run Django quite well...
>>
>> http://blog.dreid.org/2009/03/twisted-django-it-wont-burn-down-
>> your.html
>>
>> At least there are alternatives to apache which might be simpler.
>>
>> The core question remains:  While the presentation layer support is
>> defined by django, pyjamas etc, the integration with the underlying
>> sage process process is the issue, not the web presentation.
>>
>> In the current architecture, a twistd daemon spawns a notebook server
>> which is responsible for doing "sage" stuff.  twistd is fully
>> asynchronous, but the notebook process itself is a pexpect based
>> blocking process connected with pipes to twistd.  As such, the block
>> on read by pexpect precludes the sage process servicing asynchronous
>> events.
>>
>> IMHO, this architecture is incorrect and limited... Perhaps this is
>> part of what is being rethought... if not, I believe it should be.
>>
>> A preferable architecture is an event loop which dispatches requests
>> within the sage process.  Since Sage is written in python, I would
>> suggest Twisted for this but there might be better alternatives (I'd
>> be surprised, but its possible)
>>
>> Using this approach, one could easily add other elements to the core
>> event loop to support asynchronous processing (timers, communication,
>> etc) without threads which are, in this case, unnecessary.  Threads
>> when necessary are bad enough, when they're introduced because of
>> unnecessary blocking, one gets all the threading nightmare without any
>> benefit. (reminds me of health care reform)
>>
>> Another benefit is that since asynchronous event processing is the
>> widely accepted approach to this type of problem, there are lots of
>> libraries / packages to "make it so".
>
> It might be a bit off topic, but personally I think an actual multi-
> threaded app, where some threads may be blocked (and that's not a
> problem because the other threads can continue on) is sometimes
> easier to reason about then having to do everything asynchronously.
> The asynchronous model works well when processing each event is
> relatively quick or has a natural callback, but otherwise it often
> feels like having to manually enforce multitasking so as to not block
> the entire reactor. Multithreading will have to be introduced at one
> level or another to scale the notebook to more than a single
> processor anyways.
>
> - Robert


I'm reviving this thread, since I just got very curious about how to
solve the problem you (=Robert Bradshaw)  were alluding to above in a
particular case, since I'm working on the notebook all weekend.  As a
reminder, here is the problem:  In the Sage notebook, we want to have
a feature where the user can click "Download all worksheets" and the
notebook server will prepare a zip archive of all their worksheets,
then hand it to the user.  Robert Bradshaw implemented a function to
do this a few months ago, but it is disabled on the public Sage
notebook servers.  Why?  Because while the zip archive is being
created the notebook server simply ignores all other requests.  In
particular, let's say I have 500 worksheets and creating sws files and
zipping them all up takes 30 seconds (that's about how long it
actually takes), then when I click that "Download all link" the entire
http://sagenb.org will appear to be down to everybody in the world for
the next 30 seconds.  Not good, especially given that
http://sagenb.org has over 4000 more users now than it did a month
ago...

The Sage Notebook is a Twisted application, and Twisted's "deferreds"
might seem like a good idea for solving the above problem.  However,
they are actually *not* at all meant to solve the above sort of
problem, which is made I think very clear by the Twisted
documentation, which lists two types of async problems -- cpu bound
and "waiting for a resource" bound.   The problem, at its simplest
level, is that no matter you do with Twisted deferreds -- making the
zip file little by little -- everything happens in a single thread,
and a total of at least 30 seconds of CPU time has to be spent by the
Sage notebook server making that zip archive.  And that's 30 seconds
that the notebook server isn't responding to users, so overall the
notebook is going to feel sluggish to users.   Also, it just seems
dumb to slow the notebook server down like this, given that, e.g.,
sagenb.org is running on an 8-core multicore virtual machine.

Fortunately,

http://twistedmatrix.com/projects/core/documentation/howto/gendefer.html

gives an example similar to this problem as an example, and explains
how to easily solve it in two lines using *threads*.   So I took the
big chunk of scary blocking code that Robert Bradshaw wrote, put it in
a closure (a little next function f), and added the following two
lines to the server:

        from twisted.internet import threads
        return threads.deferToThread(f)

That's it.  It worked first try, and solves the problem.   What
happens behind the scenes is that Twisted uses a separate thread to
run the one function f, then when f completes it returns the output of
f.  So it wraps the idea of "do something in a thread" with a
deferred.

Twisted experts -- please explain the drawbacks of this approach...

By the way, the Twisted documentation has got way way better than I
remember it being in 2006.

 -- William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: notebook rewrite

Reply via email to