[google-appengine] Re: 1 application, multiple datastores

hawkett Wed, 07 Jan 2009 05:48:09 -0800

> Huh?  How can you make a "wrong call" that doesn't have any
> parameters?
>
> Here's the application code:
>      {operations on application-wide datastore}
>      change_to_application_userstore() # note - no parameters
>      {operations on user-specific datastore}
>      {return to user}


Ok - I understand (maybe), I don't think it matches what 106 is asking
for though - none of these data stores appear to be accessible between
applications - they all appear to be tied to a single application - or
are you saying the user specific data store is portable between
applications? i.e. my application can access it via db APIs, and so
can yours, provided the user is logged in?

If you don't intend portability of the user store, I agree that the
risk is different, and much lower, because the partitioning mechanism
does at least exist, and the chance of a bug is *much* lower because
the actual db query is likely to be different.  When we were talking
about cross app queries, the db schemas in each data store were liekly
to be the same, which made the risk of data exposure very high.  In
the implementation you now describe, the user data store and the
application data store probably have substantially different schemas.
The datastores with the same schema (user) is partitioned.  I can see
value in this approach, although it does add complexity.

Essentially you are recommending strict data partitioning (aka 945)
plus a shared application datastore?

If you intend for the user data store to be portable between apps,
then I have problems with that approach.  I think it should use a
specific data API, and not db level access.  There's too much
unwarranted trust involved between the apps - i.e. you have to trust
that I read/write the db properly, as does everyone else - I imagine
over time such a shared database would get very 'dirty'.  If you use
an API then it can enforce structure and data integrity through
validation.  The portable user datastore (if that is what you are
suggesting) is a good idea, but I think it is something that google
has already implemented to some degree with their social data API -
i.e. a bunch of data attached to your identity.  I guess it depends on
your implementation how useful this is.

To me, the portability of data and data partitioning should be treated
separately.

The other thing to note is that in order to map users to data
partitions, you need one of two things -
1.  An API that your application can use to do so - accidently map the
wrong user to the wrong data store = data exposure problem.
2.  Some form of platform supplied user provisioing - aka 945

Which of the above are you proposing?

On Jan 6, 2:46 pm, Andy Freeman <ana...@earthlink.net> wrote:
> > I guess one of us will be surprised then :) - I would be surprised if
> > gmail, sites, blogger, picassa, orkut etc. all operated in an open
> > space and avoided data exposure through code implemented in each of
> > those applications.
>
> If the separation is by name and ordinary "file" access control, the
> "code implemented" consists of the name of the datastore for the
> application plus some application configuration that has to happen
> regardless.  I'm pretty sure that google thinks that their folks can
> open an application-specific datastore name reliably.  And, if they
> fail, they're talking to a datastore with the wrong structure.
>
> Or, are you thinking that those applications use a different datastore
> per external user?  (If "separate datastore per user" is the usage
> pattern, bigtable requires far less concurrency support than the
> report mentions.)
>
> > - and does not give DB level access to it.  So I think just by
> > observing google's current architecture, it makes sense that they
> > wouldn't break with that tradition at the application level for GAE.
> > And not just because its tradition, but because it is rooted in sound
> > architectural principles
>
> What "db level access" are you talking about?  The result of that open
> call is used by every other bigtable operation, including all db
> operations performed at the datastore.  Unless GAE works differently,
> the runtime has access to that result.
>
> > > Not so fast.  Who said anything about application visible tokens?  In
> > > fact, it could be just "change_to_application_userstore", where a
> > > userstore is an ordinary GAE datastore.  This could easily be written
> > > so it doesn't take any parameters from application code, which makes
> > > it just as secure as an "open datastore" call done at process startup.
> > And regardless, you can easily introduce the cited bug based on your
> > clarification.  Simply make the wrong call to 'change_to_datastore',
> > and you still have the exposure problem.  When your code is
> > responsible for selecting the datastore, you can introduce the bug.
> > This is fairly obvious.
>
> Huh?  How can you make a "wrong call" that doesn't have any
> parameters?
>
> Here's the application code:
>      {operations on application-wide datastore}
>      change_to_application_userstore() # note - no parameters
>      {operations on user-specific datastore}
>      {return to user}
>
> The runtime knows what user and the mapping from said user to an
> application-specific datastore.  The application doesn't specify the
> user and doesn't even know the name of the datastore.
>
> There are only two mistakes that the application writer can make -
> calling change_to_application_userstore too early or too late.
>
> If the change_to_application_userstore() call is too late, the
> application will try to perform some user-specific operations on the
> application-wide datastore, but those will likely fail because its
> structure is completely different.  Note that the application doesn't
> have access to any data from the user's datastore at that point.
>
> If the change_to_application_userstore() call is too early, the
> application will try to perform some application-generic operations on
> the user's datastore, but those will likely fail for the same reason
> as above.  Moreover, this can't leak user data because the application
> only has access to the user's datastore at that point.
>
> > You are still asserting that application code carries the same
> > robustness profile as a platform code.
>
> No, I'm not.  I'm pointing out that the platform includes the run-time
> and that run-time can provide meaningful services in this area.  If
> it's already providing related services, and I'm pretty sure that it
> is calling "open_application_datastore" with some application-specific
> key on startup, this doesn't change the risk profile.
>
> Do you really want to argue that the platform code in the run-time has
> a significantly different "robustness profile" than platform code
> running on a different server?  (If I'm correct about it already
> providing related services, you're actually arguing about the relative
> robustness of related run-time code.)  Would platform code running in
> a different process on the same machine have yet another robustness
> profile?
>
> On Jan 6, 4:57 am, hawkett <hawk...@gmail.com> wrote:
>
> > > > > How do you know how the current GAE code actually works?
>
> > > > I read the API docs - how do you manage it?
>
> > > I'm not the one asserting that there are hard boundaries between GAE
> > > datastores that the GAE run-time can't pierce.
>
> > Neither am I - I am asserting that there are hard boundaries that you
> > or I can't pierce, and that is a feature of the security
> > architecture.  The API docs bear out that assertion.  I do *expect*
> > that data partitioning is a DB layer feature, but as I said
> > previously, I don't know that.
>
> > > It is generally believed that GAE is built on top of BigTable, which
> > > has a lot of internal Google users.  I don't know that all of them can
> > > work with only one datastore; I'd guess that several require to access
> > > multiple datastores simultaneously.  So, if there is a BigTable-level
> > > "only one datastore" and/or "can't switch" restriction, I'd be very
> > > surprised if was universal or could only be pierced by suid
> > > applications.
>
> > I guess one of us will be surprised then :) - I would be surprised if
> > gmail, sites, blogger, picassa, orkut etc. all operated in an open
> > space and avoided data exposure through code implemented in each of
> > those applications.  That seems a ludicrous architecture to me - which
> > is my point in this thread I guess. It makes much more sense to me to
> > have the partitioning logic at the DB level (like a standard database
> > tablespace), and for those applications to leverage that.  Then they
> > expose API's to access their data at the application level - not use
> > the DB API's.
>
> > Google does, in fact, expose API's for data access 
> > -http://code.google.com/apis/gdata/
> > - and does not give DB level access to it.  So I think just by
> > observing google's current architecture, it makes sense that they
> > wouldn't break with that tradition at the application level for GAE.
> > And not just because its tradition, but because it is rooted in sound
> > architectural principles.
>
> > > Not so fast.  Who said anything about application visible tokens?  In
> > > fact, it could be just "change_to_application_userstore", where a
> > > userstore is an ordinary GAE datastore.  This could easily be written
> > > so it doesn't take any parameters from application code, which makes
> > > it just as secure as an "open datastore" call done at process startup.
>
> > > Or, it could support one token, so the application has access to the
> > > "default" datastore and a datastore determined by such a call.  Again,
> > > that call need not take parameters from application code.
>
> > I think this is getting away from the 106 proposal now, which states -
> > 'This feature request is about allowing cross app queries using the db
> > APIs only'
>
> > And regardless, you can easily introduce the cited bug based on your
> > clarification.  Simply make the wrong call to 'change_to_datastore',
> > and you still have the exposure problem.  When your code is
> > responsible for selecting the datastore, you can introduce the bug.
> > This is fairly obvious.
>
> > > This could easily be written so it doesn't take any parameters from 
> > > application code, which makes
> > > it just as secure as an "open datastore" call done at process startup.
>
> > You are still asserting that application code carries the same
> > robustness profile as a platform code.  This is clearly not the case.
> > If there are N applications implementing the application API, vs just
> > the platform implementing the platform API, then it is a simple matter
> > of statistics to show that you will get at least N times as many
> > bugs.  In fact it will be much more than N, because the volume of
> > testing on the platform will be N times greater, and ther
> > implementation process will be much more rigourous than most
> > application.  Without doing the analysis, I would expect the platform
> > fragility (e.g. fragility = defects per month) to decrease
> > exponentially as N increases.  Using the application API, I expect
> > fragility would remain roughly constant, and unrelated to N.  But
> > there is a hidden bigger probem - if fragility remains constant on a
> > per app basis, then customers see app engine as a minefield - which
> > apps are well implemented?  The one they choose could be a broken
> > one.  How would they know?
>
> > This means across the board, the risk of data exposure _from the
> > customer perpsective_ is much worse if partitioning logic is performed
> > in application code.
>
> > What do you think of the possibility of being able to decide when you
> > deploy your app how strict the data partitioning should be?  In the
> > marketplace concept, the customer could be made aware of the
> > strictness of data partitioing when they sign up.  My main concern is
> > protecting customer data, and giving customers confidence in the data
> > security of the GAE platform.  This is how I read the intent of the
> > original poster as well.
>
> > On Jan 6, 3:40 am, Andy Freeman <ana...@earthlink.net> wrote:
>
> > > > > > As it stands GAE does not allow cross data store queries,
> > > > > > and from my perspective that is an aspect of the security
> > > > > > architecture.  106 wants that aspect 'relaxed'.
>
> > > > > How do you know how the current GAE code actually works?
>
> > > > I read the API docs - how do you manage it?
>
> > > I'm not the one asserting that there are hard boundaries between GAE
> > > datastores that the GAE run-time can't pierce.
>
> > > It is generally believed that GAE is built on top of BigTable, which
> > > has a lot of internal Google users.  I don't know that all of them
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: 1 application, multiple datastores

Reply via email to