[google-appengine] Re: 1 application, multiple datastores

Andy Freeman Fri, 09 Jan 2009 13:22:44 -0800

> Ok - I understand (maybe), I don't think it matches what 106 is asking
> for though


It doesn't support 106, but that wasn't the goal.

The goal was to show that one could support application--driven
datastore choice with an appropriate amount of security.

The call to support 106 would be different, but its existence would
not mean that an application using change_to_application_userstore()
was any less secure.

Both (and others) require different application configuration as well.

For sharing a datastore between aps, I'd go through an app that
managed said shared datastore, but that's something best left up to
the designer - it isn't a platform level decision.

> Which of the above are you proposing?

I'm still not proposing anything.  I'm pointing out that GAE can
reasonably support a wide range of application to datastore access
patterns.


On Jan 7, 5:28 am, hawkett <hawk...@gmail.com> wrote:
> > Huh?  How can you make a "wrong call" that doesn't have any
> > parameters?
>
> > Here's the application code:
> >      {operations on application-wide datastore}
> >      change_to_application_userstore() # note - no parameters
> >      {operations on user-specific datastore}
> >      {return to user}
>
> Ok - I understand (maybe), I don't think it matches what 106 is asking
> for though - none of these data stores appear to be accessible between
> applications - they all appear to be tied to a single application - or
> are you saying the user specific data store is portable between
> applications? i.e. my application can access it via db APIs, and so
> can yours, provided the user is logged in?
>
> If you don't intend portability of the user store, I agree that the
> risk is different, and much lower, because the partitioning mechanism
> does at least exist, and the chance of a bug is *much* lower because
> the actual db query is likely to be different.  When we were talking
> about cross app queries, the db schemas in each data store were liekly
> to be the same, which made the risk of data exposure very high.  In
> the implementation you now describe, the user data store and the
> application data store probably have substantially different schemas.
> The datastores with the same schema (user) is partitioned.  I can see
> value in this approach, although it does add complexity.
>
> Essentially you are recommending strict data partitioning (aka 945)
> plus a shared application datastore?
>
> If you intend for the user data store to be portable between apps,
> then I have problems with that approach.  I think it should use a
> specific data API, and not db level access.  There's too much
> unwarranted trust involved between the apps - i.e. you have to trust
> that I read/write the db properly, as does everyone else - I imagine
> over time such a shared database would get very 'dirty'.  If you use
> an API then it can enforce structure and data integrity through
> validation.  The portable user datastore (if that is what you are
> suggesting) is a good idea, but I think it is something that google
> has already implemented to some degree with their social data API -
> i.e. a bunch of data attached to your identity.  I guess it depends on
> your implementation how useful this is.
>
> To me, the portability of data and data partitioning should be treated
> separately.
>
> The other thing to note is that in order to map users to data
> partitions, you need one of two things -
> 1.  An API that your application can use to do so - accidently map the
> wrong user to the wrong data store = data exposure problem.
> 2.  Some form of platform supplied user provisioing - aka 945
>
> Which of the above are you proposing?
>
> On Jan 6, 2:46 pm, Andy Freeman <ana...@earthlink.net> wrote:
>
>
>
> > > I guess one of us will be surprised then :) - I would be surprised if
> > > gmail, sites, blogger, picassa, orkut etc. all operated in an open
> > > space and avoided data exposure through code implemented in each of
> > > those applications.
>
> > If the separation is by name and ordinary "file" access control, the
> > "code implemented" consists of the name of the datastore for the
> > application plus some application configuration that has to happen
> > regardless.  I'm pretty sure that google thinks that their folks can
> > open an application-specific datastore name reliably.  And, if they
> > fail, they're talking to a datastore with the wrong structure.
>
> > Or, are you thinking that those applications use a different datastore
> > per external user?  (If "separate datastore per user" is the usage
> > pattern, bigtable requires far less concurrency support than the
> > report mentions.)
>
> > > - and does not give DB level access to it.  So I think just by
> > > observing google's current architecture, it makes sense that they
> > > wouldn't break with that tradition at the application level for GAE.
> > > And not just because its tradition, but because it is rooted in sound
> > > architectural principles
>
> > What "db level access" are you talking about?  The result of that open
> > call is used by every other bigtable operation, including all db
> > operations performed at the datastore.  Unless GAE works differently,
> > the runtime has access to that result.
>
> > > > Not so fast.  Who said anything about application visible tokens?  In
> > > > fact, it could be just "change_to_application_userstore", where a
> > > > userstore is an ordinary GAE datastore.  This could easily be written
> > > > so it doesn't take any parameters from application code, which makes
> > > > it just as secure as an "open datastore" call done at process startup.
> > > And regardless, you can easily introduce the cited bug based on your
> > > clarification.  Simply make the wrong call to 'change_to_datastore',
> > > and you still have the exposure problem.  When your code is
> > > responsible for selecting the datastore, you can introduce the bug.
> > > This is fairly obvious.
>
> > Huh?  How can you make a "wrong call" that doesn't have any
> > parameters?
>
> > Here's the application code:
> >      {operations on application-wide datastore}
> >      change_to_application_userstore() # note - no parameters
> >      {operations on user-specific datastore}
> >      {return to user}
>
> > The runtime knows what user and the mapping from said user to an
> > application-specific datastore.  The application doesn't specify the
> > user and doesn't even know the name of the datastore.
>
> > There are only two mistakes that the application writer can make -
> > calling change_to_application_userstore too early or too late.
>
> > If the change_to_application_userstore() call is too late, the
> > application will try to perform some user-specific operations on the
> > application-wide datastore, but those will likely fail because its
> > structure is completely different.  Note that the application doesn't
> > have access to any data from the user's datastore at that point.
>
> > If the change_to_application_userstore() call is too early, the
> > application will try to perform some application-generic operations on
> > the user's datastore, but those will likely fail for the same reason
> > as above.  Moreover, this can't leak user data because the application
> > only has access to the user's datastore at that point.
>
> > > You are still asserting that application code carries the same
> > > robustness profile as a platform code.
>
> > No, I'm not.  I'm pointing out that the platform includes the run-time
> > and that run-time can provide meaningful services in this area.  If
> > it's already providing related services, and I'm pretty sure that it
> > is calling "open_application_datastore" with some application-specific
> > key on startup, this doesn't change the risk profile.
>
> > Do you really want to argue that the platform code in the run-time has
> > a significantly different "robustness profile" than platform code
> > running on a different server?  (If I'm correct about it already
> > providing related services, you're actually arguing about the relative
> > robustness of related run-time code.)  Would platform code running in
> > a different process on the same machine have yet another robustness
> > profile?
>
> > On Jan 6, 4:57 am, hawkett <hawk...@gmail.com> wrote:
>
> > > > > > How do you know how the current GAE code actually works?
>
> > > > > I read the API docs - how do you manage it?
>
> > > > I'm not the one asserting that there are hard boundaries between GAE
> > > > datastores that the GAE run-time can't pierce.
>
> > > Neither am I - I am asserting that there are hard boundaries that you
> > > or I can't pierce, and that is a feature of the security
> > > architecture.  The API docs bear out that assertion.  I do *expect*
> > > that data partitioning is a DB layer feature, but as I said
> > > previously, I don't know that.
>
> > > > It is generally believed that GAE is built on top of BigTable, which
> > > > has a lot of internal Google users.  I don't know that all of them can
> > > > work with only one datastore; I'd guess that several require to access
> > > > multiple datastores simultaneously.  So, if there is a BigTable-level
> > > > "only one datastore" and/or "can't switch" restriction, I'd be very
> > > > surprised if was universal or could only be pierced by suid
> > > > applications.
>
> > > I guess one of us will be surprised then :) - I would be surprised if
> > > gmail, sites, blogger, picassa, orkut etc. all operated in an open
> > > space and avoided data exposure through code implemented in each of
> > > those applications.  That seems a ludicrous architecture to me - which
> > > is my point in this thread I guess. It makes much more sense to me to
> > > have the partitioning logic at the DB level (like a standard database
> > > tablespace), and for those applications to leverage that.  Then they
> > > expose API's to access their data at the application level - not use
> > > the DB API's.
>
> > > Google does, in fact, expose API's for data access 
> > > -http://code.google.com/apis/gdata/
> > > - and does not give DB level access to it.  So I think just by
> > > observing google's current architecture, it makes sense that they
> > > wouldn't break with that tradition at the application level for GAE.
> > > And not just because its tradition, but because it is rooted in sound
> > > architectural principles.
>
> > > > Not so fast.  Who said anything about application visible tokens?  In
> > > > fact, it could be just "change_to_application_userstore", where a
> > > > userstore is an ordinary GAE datastore.  This could easily be written
> > > > so it doesn't take any parameters from application code, which makes
> > > > it just as secure as an "open datastore" call done at process startup.
>
> > > > Or, it could support one token, so the application has access to the
> > > > "default" datastore and a datastore determined by such a call.  Again,
> > > > that call need not take parameters from
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: 1 application, multiple datastores

Reply via email to