[appengine-java] Re: GAE Performance

Jason (Google) Tue, 27 Oct 2009 14:18:40 -0700

You can't view entity groups in the data viewer. You can trace an entity
group programatically by retrieving each entity above a given entity (its
parent) recursively until you reach the root. More simply, you can use the
entity's Key to retrieve parent Keys until getParent() returns null.


http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/Key.html

- Jason

On Mon, Oct 26, 2009 at 9:06 AM, Diana Cruise <diana.l.cru...@gmail.com>wrote:

>
> Relating to entity groups, how can we determine what entity group each
> entity belongs to?  Using the Data Viewer, I would think we could
> examine this type of setup info for each entity but I have NOT found
> how to do that.  Thanks!
>
> On Oct 23, 1:50 pm, "Jason (Google)" <apija...@google.com> wrote:
> > Hi Diana. As others have stated, App Engine can write to multiple entity
> > groups in parallel, so if each User entity is a root entity or is
> otherwise
> > placed in a different entity group, then there shouldn't be any issues.
> > Regarding performance, all apps should generally be able to handle up to
> 30
> > simultaneous dynamic requests assuming a 75ms processing time for each
> > (average load), for a throughput of 400 qps or so:
> >
> > http://code.google.com/appengine/docs/java/runtime.html#Quotas_and_Li...
> >
> > If you want any other performance or cost-related numbers, let me know.
> >
> > For updates to the same entity or entity group, App Engine uses
> optimistic
> > concurrency as opposed to locking. If an entity is already being updated,
> > then the second request will fail and will automatically get retried on
> the
> > server. After consistent failures, an exception will be thrown which you
> can
> > catch to either handle gracefully. Datastore writes will fail from time
> to
> > time, generally about 0.1 to 0.2 percent of the time, but the failure
> rate
> > will be higher when there is contention, i.e. a high rate of simultaneous
> > writes to the same entity/entity group.
> >
> > http://code.google.com/appengine/articles/scaling/contention.html
> >
> > - Jason
> >
> > On Thu, Oct 22, 2009 at 8:04 AM, Diana Cruise <diana.l.cru...@gmail.com
> >wrote:
> >
> >
> >
> >
> >
> > > I'm glad to hear that the 1-10 requests/second is per User root
> > > entity...in my case this means that huge number of Users logged in
> > > around the world should expect sub-second response even if tens of
> > > thousands clicked the Update button at the same instance in time!
> >
> > > The only problem is we do NOT hear from anyone outside of Google to
> > > confirm performance of large volume for specific applications and what
> > > the real costs are!!!
> >
> > > Regarding deadlock, I hear GAE does NOT both with lock timeouts so as
> > > soon as a transaction trys to retrieve a record that is already
> > > locked, it will receive an error and have to retry.
> >
> > > On Oct 19, 5:50 pm, "Dr. Flufenstein" <michael.brink...@gmail.com>
> > > wrote:
> > > > Preface: Please note, I'm not speaking for google at all in this note
> > > > and a lot of what I've written is speculation based on what I've read
> > > > in various GAE docs as well as some meager knowledge of how
> relational
> > > > DBs generally work.  And yes, I know datastore isn't a relational DB,
> > > > but I believe that their indexing implementation likely runs into
> many
> > > > of the same problems you have with indexing relational data although
> > > > that assumption could be completely wrong.
> >
> > > > From what I can tell, the update bottleneck you're referring to is
> for
> > > > updating what you would often think of as a single record if you were
> > > > persisting one instance of your User as a single denormalized record
> > > > in a relational schema.  I suspect this bottleneck is due to the
> > > > datastore architecture and the way that data updates are accumulated
> > > > (possibly grouped/keyed by PK) in a queue, which is probably read
> from
> > > > like a cache if read requests come in before the data has been
> flushed
> > > > into the actual storage medium and replicated to the other
> > > > datacenters.
> >
> > > > So if each of your users were updating their own User records, I
> don't
> > > > believe you'd experience that limitation which may be an artifact of
> > > > how those in-memory queue/cache structures are managed/locked during
> > > > updates (i.e. a new update for a record may be held until it's been
> > > > flushed from the queue to the storage medium to prevent having to
> > > > merge/reconcile records in the queue).  If they were all updating a
> > > > single shared record, then I think you'd hit this pretty quick.
> >
> > > > Let's say though that your users are updating separate records...as
> > > > your data size grows, you will probably see your update throughput
> > > > decrease as other factors become dominant, and I believe this will
> > > > primarily be dependent on the number and composition of the indexes
> on
> > > > your data as well as the number of entities persisted.  To me, this
> is
> > > > the much riskier unknown because your average index structure is
> > > > harder to update piecewise in parallel because the index must allow
> > > > you to order/search all of the records' indexed columns.  In an RDBMS
> > > > like SQL Server or Oracle, you'd see some level of index locking take
> > > > place during each transaction (maybe one page of an index) to allow
> > > > concurrent updates to different sections of an index before the
> > > > updates are committed, the transaction is ended and the locks are
> > > > released.
> >
> > > > In relational persistence systems, this gets slower as the indexes
> > > > become larger and is usually overcome with a technique like
> > > > partitioning which, if you aren't familiar with it, sort of gives you
> > > > a top level of your index tree where the data is actually spread into
> > > > n groups of tables/indexes depending on some value in each record,
> and
> > > > you usually pick a partition key so that data volume in each
> partition
> > > > is kind of naturally balanced because rebalancing across partitions
> is
> > > > expensive.  I'm not sure that any kind of similar mechanism has been
> > > > exposed in the GAE datastore right now and so a single index declared
> > > > for an entity type is probably realized as one big index.  I would
> > > > hope that there's sub-index granularity for locking during updates,
> > > > but I'm actually guessing that's not the case for a couple of
> reasons:
> >
> > > > 1) With most relational systems, you need to periodically rebuild the
> > > > index or at least refresh the index statistics.  I like to simplify
> > > > this and think of rebuilding as rebalancing the data tree for optimal
> > > > access speed while refreshing statistics typically just helps query
> > > > optimizers decide whether use of an index should be preferred.  On
> the
> > > > GAE though, they require you to have an index for each combination of
> > > > query parameters, so I suspect that statistics don't come into play.
> > > > And I haven't seen a "rebuild my indexes" function in the admin UI
> > > > although admittedly I haven't looked for one too hard so I wonder if
> > > > they aren't trying to keep the data tree somewhat well balanced
> during
> > > > each data update, which would require the entire index to potentially
> > > > be locked.
> >
> > > > 2) I also haven't read anything yet about deadlock situations on GAE
> > > > which can happen surprisingly easily if you're updating multiple
> > > > indexes with enough concurrency and are using page locking.  If you
> > > > were designing the GAE datastore service, the way to avoid that
> > > > situation would be to lock all indexes on each data update in the
> same
> > > > order every time.  You'd sacrifice a lot of throughput, but you'd
> > > > never hit a deadlock so I suspect they've done something like this
> > > > behind the scenes unless people just aren't using GAE heavily enough
> > > > yet or the good people of the GAE have used some special sauce in the
> > > > datastore service impl.
> >
> > > > So I guess what I'm trying to say is that I don't believe that you
> > > > should be satisfied with any particular bit of performance data from
> > > > another application because your mileage will almost certainly vary.
> > > > I think that If you really want to know how your application would
> > > > perform and want to find out before writing the whole app and sharing
> > > > it with a billion users, I would recommend a very empirical approach:
> >
> > > > I'd write a sample app with with entity group where entity widths and
> > > > indexes are those that you think will be representative of your
> > > > deployed application and then add a simple test harness that will:
> >
> > > > a) seed data to a point that you think is representative
> > > > b) update and query your data in what you believe will be a worst
> case
> > > > scenario and then record the times
> >
> > > > I think the resulting curve of performance you see will be highly
> > > > dependent on how you vary the seed data size and the number of
> > > > indexes.  Of course there are more dimensions than that, such as the
> #
> > > > of concurrent read operations and the # of concurrent write
> > > > operations, that you can vary as well depending on what your
> > > > performance requirements are.
> >
> > > > I hope this is somewhat helpful and I also hope that it's not totally
> > > > incorrect and misleading since, as I said, it's all rampant
> > > > speculation based on somewhat limited publicly available data.
> >
> > > > -Michael
> >
> > > > P.S.  Of course, if anyone has data including # of records, #/
> > > > composisiton of indexes, # reads per hour, # writes per hour and
> > > > latency per txn, I'd be fascinated to hear about it too!
> >
> > > > On Oct 19, 4:01 pm, Diana Cruise <diana.l.cru...@gmail.com> wrote:
> >
> > > > > This is exactly what I'm am talking about...in my case the User and
> > > > > UserAddr are both in the same Entity Group.  So, are you saying
> that
> > > > > my application which has a global presence in GAE can only support
> 25
> > > > > simultaneous Users performing this update in under 5 seconds?
> >
> > > > > Again, I take 1-10 requests per second response and go with the avg
> of
> > > > > 5/s.  Add up 25 Users simultaneously hitting this Entity Group and
> > > > > that consumes a full 5 seconds.  So, if you have 25 Users doing the
> > > > > same update over and over they will each have about a 5 second
> > > > > response.
> >
> > > > > I know I am wrong because this is way LOW for a Google platform or
> any
> > > > > other...I just am NOT hearing or seeing numbers that say otherwise.
> >
> > > > > If you clarify for me that this Entity Group performance stat of
> 1-10/
> > > > > s is granular to the Row then we're on to something...  That would
> > > > > tell me that my scenario above only applies if ALL Users were
> logged
> > > > > into the same account!!!  If the Entity Group performance stat is
> > > > > granular to the Row then that would mean an infinite number of
> Users
> > > > > would average 5 updates per second.  Please tell me this is TRUE!
> >
> > ...
> >
> > read more »- Hide quoted text -
> >
> > - Show quoted text -
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~----------~----~----~----~------~----~------~--~---

[appengine-java] Re: GAE Performance

Reply via email to