[sage-devel] Re: notebook rewrite

Alex Clemesha Sun, 23 Aug 2009 08:51:27 -0700

On Fri, Aug 21, 2009 at 5:56 PM, Yoav Aner<y...@gingerlime.com> wrote:
>
> Sounds like a great idea to me to de-couple the notebook from sage.
> Appengine is not the only option though (but maybe the cheapest at
> least for now), you could probably use an Amazon EC2 instance just as
> easily (and with some more facilities at your disposal, having a
> virtual server running).
>
> Some more input from a security perspective: De-coupling the notebook
> and the processing engine is is one of the key recommendations on my
> threat model (http://groups.google.com/group/sage-devel/browse_thread/
> thread/4bf627a69e0401c0 more details will be available soon as I hope
> to complete a draft of the entire paper, or the final version due by
> 4th September).
I just read your paper (http://www.gingerlime.com/sageNotebookThreatModel.pdf),
and it's very impressive how in depth you go, nice job.


I wanted to point out a couple of things related to de-coupling the notebook
from sage, and the current security situation in the sage notebook.

A good portion of the 'security' related code (HTTP sessions) in the
sage notebook was written by
me (see 'sage/server/notebook/avatars.py' or
'sage/server/notebook/run_notebook.py', etc)
and is old and crufty, and probably has some security vulnerabilities.
 I've long since realized that trying to
write you own http sessions framework is a bad idea (obviously).

As you point out, decoupling the notebook from sage, and using more
well established
frameworks (like Django) is an excellent way to improve security
because you have hundreds
to people testing, using, and writing the code for you.  In fact, I
have started a project
called codenode (used to be called Knoboo, or sometime badly spelled as Knooboo)
that is exactly what you speak of: a de-coupled sage notebook that use Django.
See here: http://codenode.org and here: http://github.com/codenode/codenode


>
> As far of having notebook running on appengine. It would probably be
> more straight-forward to use Robert's model - i.e. user->notebook on
> appengine->sage backend. Otherwise issues like user authentication
> (token mangement), synchronisation etc sound like a potential
> nightmare to me. This 'standard' architecture still has its own
> issues, particularly with appengine. I don't believe google allows to
> initiate ssh connections to a backend (for the pexpect interface),
> only web-based requests. Google also try to push users to have a
> google account to authenticate. It might be a good or a bad thing,
> depending on your perspective. Amazon EC2 in that respect gives you
> more flexibility I believe. I would personally avoid either from a
> vendor lock-in perspective, but that's just me.
One of the "backends" of codenode can be google app engine, which
is awesome because you get the security benefits that comes along with
running arbitrary code on google's servers.  You can try it out
right now here: http://live.codenode.org

Additionally, codenode works fine using EC2 as a backend as well.
In fact, the backend of live.codenode.org used to be EC2, but it was a little
expensive, so we are using app engine for the time being (even though
app engine is more limited). EC2 is essentially just a
"full capabilities virtual machine instance" that is no different that running
(say) a Virtualbox or VMware instance on servers that you own yourself.


>
> Another plus point for google appengine in terms of security - you get
> the added security that the appengine provides over and above standard
> python and you 'offload' any security problems with the notebook
> itself to google. However, if someone does hack your notebook, not
> sure whether google will simply shut you down (they probably will). Of
> course it only applies to the Notebook code itself, and even then it
> won't solve any XSS issues for you. It obviously won't help with any
> security issue relating to the backend either, which is where the sage
> 'soft-spot' is currently.
>
> Unrelated to appengine, using a web framework like django is a good
> idea from a security standpoint. It should give you much more
> flexibility in terms of user authentication and authorisation with
> many backend support. That alone would make a good security
> improvement too.
Completely agreed.  I invite you to check out codenode in more detail.
You can get started by typing "easy_install codenode", or check out the
latest code at http://github.com/codenode/codenode


-Alex





>
>
> On Jul 21, 7:53 pm, William Stein <wst...@gmail.com> wrote:
>> On Tue, Jul 21, 2009 at 10:21 AM, Ondrej Certik <ond...@certik.cz> wrote:
>>
>> > On Tue, Jul 21, 2009 at 10:44 AM, William Stein<wst...@gmail.com> wrote:
>>
>> > > On Tue, Jul 21, 2009 at 9:39 AM, Ondrej Certik<ond...@certik.cz> wrote:
>>
>> > >> On Tue, Jul 21, 2009 at 1:58 AM, Robert
>> > >> Bradshaw<rober...@math.washington.edu> wrote:
>>
>> > >>> On Jul 20, 2009, at 9:02 PM, Ondrej Certik wrote:
>>
>> > >>>> Well, let me say that I really like to run things on the appengine,
>> > >>>> rather than to constantly maintain our own servers. I see no reason
>> > >>>> why the notebook cannot run on the appengine, only the AJAX would talk
>> > >>>> to our own server with Sage to actually evaluate the cells (and for
>> > >>>> many people, I think appengine itself could actually be enough). I
>> > >>>> have to think though what the best way to transfer data to the
>> > >>>> database with worksheets is though.
>>
>> > >>> +1, though for Sage we rely heavily on compiled code. I wonder how
>> > >>> much introduced latency there would be if the backend were served on
>> > >>> a university computer, and the front end in appengine.
>>
>> > >> I think none, it would be as fast as it is now (e.g. the browser
>> > >> communicating directly with the engine).
>>
>> > > How is it "none", given that there are now three separate computers
>> > > involved instead of two?  There would have to be a little extra
>>
>> > What I meant is that the latency in typing 1+1 into the cell and get
>> > the output cell saying 2 should not change at all, because the
>> > javascript in the browser sends a POST request to the Sage engine
>> > (e.g. a web app with the url interface, just like it is now) and it
>> > returns it back directly to the browser.
>>
>> Thanks for the clarification, since I clearly misunderstood you.  Robert
>> said "backend were served on a university computer, and the front end in
>> appengine."  You seem to be eliminating the frontend completely when
>> computations are done.  I.e., do you imagine appengine *just* serving some
>> javascript and a database interface, and basically nothing else?  So what
>> would happen is the following:
>>
>> 1. User visits the appengine server and gets the javascript for the sage
>> notebook (after authenticating).
>> 2. User starts a worksheet.   The javascript in the browser requests a "sage
>> engine token", and the appengine allocates a "compute engine" somewhere for
>> use by that user's worksheet.
>> 3. The user types "factor(2^197-1)" and their javascript *directly* connects
>> to the compute engine and runs the code "factor(2^197-1)".  It also connects
>> to the appengine and stores that "factor(2^197-1)" was input in the
>> database.
>> 4. The javascript in the browser gets back the answer to the factor query
>> and displays the result.
>> 5. The javascript in the browser later also stores the result in the app
>> engine database.
>>
>> I think there could be some weird security issues/tricks involved with the
>> javascript in the browser directly doing AJAX calls to the "compute engine"
>> above, but there are hacks to get around that.  There's also twice the
>
> That should only be possible if you use a common domain name i.e.
> notebook.sagenb.org and engine.sagenb.org. It seems like Google
> supports using your own domain names. It seems like a rather odd
> architecture to me, and like I said - a potential nightmare to manage
> and secure.
>
>> communications overhead between the user's javascript and remote machines
>> than in the current Sage notebook model where everything goes through the
>> notebook server.    E.g., if the output of a Sage command (in step 4 and 5
>> above)  is large, e.g., a 10MB image, then that image is going to go all
>> over the place, both uploaded and downloaded, which will be incredibly
>> expensive.
>
> Also consider how you handle authentication here. Both the notebook
> frontend and the sage backend need to know that the user is authorised
> to run a computation. Now all users are 'equal', but in future if you
> implement different permissions, it may determine their level of
> access - e.g. which backend systems (python, shell, magma...) are
> available to the user, even how much CPU/memory is allocated perhaps.
>
>>
>> > What changes is the database storage, e.g. either the javascript in
>> > the browser, once it receives the output of the cells also sends it to
>> > the appengine (or whenever the database is running), or the engine
>> > sends it itself, I don't know yet which approach is better. So there
>> > are some issues involved, like if one of those connections fail etc.
>> > But as long as both connections are up and running, the user would not
>> > recognize anything at all.
>>
>> This is an interesting design. It hadn't occured to me before.  It would be
>> interesting to see whether it is any good or not (I can't tell).
>>
>> I can tell you one thing, which is that when I start working on the notebook
>> again seriously this September, my first goal will be to create a powerful
>> system for simulating the load of n people all using the notebook at once in
>> a potentially heterogenous way (say from several different computers,
>> etc.).  This testing code will be hopefully generic enough to work with
>> codenode, sagenb, etc.   I think having actual benchmark testing code will
>> in the longrun be a better litmus test for designs than us just thinking
>> about them in the abstract.
>>
>> I could pronounce the design you suggest above as "bad" for several reasons,
>> but what if I'm wrong and in fact the design above, with some tweaks and
>> insights that would result from testing, turns out to be amazingly good?
>>
>>
>>
>>
>>
>> > > latency, i.e., whatever there is between appengine and the "sage
>> > > engine".  That said, the internet is pretty fast these days :-).  And
>> > > the scalability of a decoupled approach like we're talking about is a
>> > > big plus, if it works.
>>
>> > Right, it has to be tried to see if it works. But I think it's worthy.
>>
>> > > By the way, if you haven't already, I personally think you should
>> > > start a mailing list, web page, trac, etc. for a separate notebook
>> > > project, since you're already writing code.   There's already some
>> > > confusion about where we are supposed to have this discussion -- and a
>> > > funny mix of sage-devel and codenode doesn't seem right.
>>
>> > Well, I hope codenode guys could pick this up and they would be the
>> > notebook. I unfortunately probably can't spend too much time on this,
>> > until september. But I wanted to get this going to see which approach
>> > to take.
>>
>> Hey, same here.  Yeah for September.
>>
>>
>>
>>
>>
>> > I wrote the above in about 2 days (roughly), but it's only the first
>> > 90%, e.g. the cells sort of works, but the rest 10%, like tab
>> > completion, worksheets, saving. loading, publishing, users, fixing it
>> > so that it works 100% in all browsers..... That would take a lot more,
>> > and I can't do it yet. But I hope it's encouraging to all of you to
>> > learn some AJAX too till September, so that we can work on this
>> > together. :)
>>
>> > There is one more thing I want to try -- pyjamas, as pointed out
>> > above. I already played with it yesterday, and what I saw so far is
>> > *impressive*. So my next step will be to rewrite what I did into
>> > pyjamas (e.g. just pure python both on the server and in the browser).
>> > If that works and I think it could, well, that would be the way to go,
>> > since I could debug all those functions like for calculating cursor
>> > positions etc. in Python.
>>
>> I strongly encourage you to test pyjamas with the above.  I think that's the
>> best possible next step.
>>
>>  -- William
>
> Yoav
> >
>



-- 
Alex Clemesha
clemesha.org

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: notebook rewrite

Reply via email to