[sage-devel] Re: [sage-notebook] Scalable Sage Server Architecture Proposal v0.1

Alex Leone Sat, 01 Jan 2011 19:15:45 -0800

>
> Alex -- can you also post your document to the wiki (or a link to it)?
> http://wiki.sagemath.org/Notebook%20scalability
>


 Done.  It's in the Notes section.



> 1. I wouldn't do a "isAdmin" property for users. Rather, create one or
> > more groups that are marked as isAdmin and then add the users to that
> > group. This is basically how it is done nowdays in linux via the
> > /etc/sudoers file where a group "admin" is marked as being special and
> > the sudo command checks if the user is in the group admin.


This makes sense.  I thought that it would have to be a property on each
user so that lookups would be fast, but I realize that it would probably
just be set as a session variable when the user logs in.



> 2. The permissions, I don't really understand it. Why are they in each
> > group?
>

For worksheets, I could think of a few different kinds of permissions:
1. viewing
2. editing
3. changing the title?
4. deleting the worksheet

For groups, There's not as many, but I thought it would be good to reuse the
same mechanism:
1. adding other people to the group
2. changing the group name?
3. deleting the group

The 'perms' number is a bit-field.  If the first bit (0b0001) is set, then
the user has permission to do x.  If the second bit (0b0010) is set, the
user has permission to do y, etc.




> but if there is some crazy long output it might happen.
>

If the output gets too long, it would get saved to a separate file, just
like the current notebook saves long output to "full_output.txt".



> > Second,
> > updates on worksheets only happen on the cell level, never on the
> > whole document. I know, mongodb has the ability to update a part of a
> > document via the update command, but I think it's easier to have a
> > collection of all cells and reference to them.
>
> I'm not sure.  If you read mongodb documentation/books, the way Alex
> laid things (with all cells in a single document) out is repeatedly
> recommended by them as the recommended way to go.  The updating on
> parts of documents with mongodb is very robust, in my experience.
> Also, the data locality (having all the cells in the same document) is
> evidently a big win efficiency wise.
>
> > But still, when a cell is updated, only it's "out" field is modified.
>
> It's "in" field can also be modified, right, e.g., when you modify the
> input?  And somebody maybe even the type (why not?).
>

I considered both.  Here's what I was thinking about:

1. List references to cells that would go in a separate collection
(db.cells):

  a. If there was ever fine-grain revision history (eg see google docs), old
cell contents could stay in the db (maybe as diffs), and the worksheet
object wouldn't get huge.  But then again this could be implemented as a
diff of the whole worksheet object or something.

2. Put the cells in the worksheet object (as proposed):

  a.  Like William said, it might be better to have all the data localized.



Alex, I don't think you should use an _id field in the individual
> cells though.  They aren't complete mongodb documents themselves, so
> don't have to have an "_id" field, and if they do it isn't treated
> specially like the _id of a complete monogodb document (which is
> forced to be unique, etc.).   Thus using _id could be misleading.
>

This id helps keep track of cells on the client-side, and also if the cells
get rearranged.  Perhaps just 'id' would be a better name.



> 4. something trivial, instead of
> > out: [{ t:"stdout", data: "..."} , {t:"stderr", data: "..."}]
> > please just do
> > out: { stdout: "...", stderr: "..." }
> > Mongodb allows to list all keys in such an associative list and no
> > need for this {t: "..."} thing.
> > (or even better, get rid of "out" and just a stdout and stderr key is
> > good enough since their relative ordering doesn't matter.)
>
> +1  -- very good idea.
>

The output from a cell is a sequence of messages (Stdout, Stderr, Stdin,
Html, ...).  Consider the following code:

sys.stdout.write("out1");
sys.stderr.write("err1");
sys.stdout.write("out2");
sys.stderr.write("err2");

this would generate

Stdout("out1")
Stderr("err1")
Stdout("out2")
Stderr("err2")

The messages need to be displayed in the order that they are produced.



> 5. Images might probably be referenced explicitly, i.e. out: { img:
> > <file-id-reference> }
>

I was thinking that there would be a Plot(...) message, a JMol(,,,) message,
etc, which would reference files.


Currently in the notebook, any computation output is just a stream of bytes.
 But that stream contains different kinds of data - stdout, stderr, latex,
plots, html tables, jmol plots, references to data files that the cell
created, etc.  So why not have the computation output be that series of
"messages"?

 - Alex

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: [sage-notebook] Scalable Sage Server Architecture Proposal v0.1

Reply via email to