You can't view entity groups in the data viewer. You can trace an entity group programatically by retrieving each entity above a given entity (its parent) recursively until you reach the root. More simply, you can use the entity's Key to retrieve parent Keys until getParent() returns null.
http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/Key.html - Jason On Mon, Oct 26, 2009 at 9:06 AM, Diana Cruise <diana.l.cru...@gmail.com>wrote: > > Relating to entity groups, how can we determine what entity group each > entity belongs to? Using the Data Viewer, I would think we could > examine this type of setup info for each entity but I have NOT found > how to do that. Thanks! > > On Oct 23, 1:50 pm, "Jason (Google)" <apija...@google.com> wrote: > > Hi Diana. As others have stated, App Engine can write to multiple entity > > groups in parallel, so if each User entity is a root entity or is > otherwise > > placed in a different entity group, then there shouldn't be any issues. > > Regarding performance, all apps should generally be able to handle up to > 30 > > simultaneous dynamic requests assuming a 75ms processing time for each > > (average load), for a throughput of 400 qps or so: > > > > http://code.google.com/appengine/docs/java/runtime.html#Quotas_and_Li... > > > > If you want any other performance or cost-related numbers, let me know. > > > > For updates to the same entity or entity group, App Engine uses > optimistic > > concurrency as opposed to locking. If an entity is already being updated, > > then the second request will fail and will automatically get retried on > the > > server. After consistent failures, an exception will be thrown which you > can > > catch to either handle gracefully. Datastore writes will fail from time > to > > time, generally about 0.1 to 0.2 percent of the time, but the failure > rate > > will be higher when there is contention, i.e. a high rate of simultaneous > > writes to the same entity/entity group. > > > > http://code.google.com/appengine/articles/scaling/contention.html > > > > - Jason > > > > On Thu, Oct 22, 2009 at 8:04 AM, Diana Cruise <diana.l.cru...@gmail.com > >wrote: > > > > > > > > > > > > > I'm glad to hear that the 1-10 requests/second is per User root > > > entity...in my case this means that huge number of Users logged in > > > around the world should expect sub-second response even if tens of > > > thousands clicked the Update button at the same instance in time! > > > > > The only problem is we do NOT hear from anyone outside of Google to > > > confirm performance of large volume for specific applications and what > > > the real costs are!!! > > > > > Regarding deadlock, I hear GAE does NOT both with lock timeouts so as > > > soon as a transaction trys to retrieve a record that is already > > > locked, it will receive an error and have to retry. > > > > > On Oct 19, 5:50 pm, "Dr. Flufenstein" <michael.brink...@gmail.com> > > > wrote: > > > > Preface: Please note, I'm not speaking for google at all in this note > > > > and a lot of what I've written is speculation based on what I've read > > > > in various GAE docs as well as some meager knowledge of how > relational > > > > DBs generally work. And yes, I know datastore isn't a relational DB, > > > > but I believe that their indexing implementation likely runs into > many > > > > of the same problems you have with indexing relational data although > > > > that assumption could be completely wrong. > > > > > > From what I can tell, the update bottleneck you're referring to is > for > > > > updating what you would often think of as a single record if you were > > > > persisting one instance of your User as a single denormalized record > > > > in a relational schema. I suspect this bottleneck is due to the > > > > datastore architecture and the way that data updates are accumulated > > > > (possibly grouped/keyed by PK) in a queue, which is probably read > from > > > > like a cache if read requests come in before the data has been > flushed > > > > into the actual storage medium and replicated to the other > > > > datacenters. > > > > > > So if each of your users were updating their own User records, I > don't > > > > believe you'd experience that limitation which may be an artifact of > > > > how those in-memory queue/cache structures are managed/locked during > > > > updates (i.e. a new update for a record may be held until it's been > > > > flushed from the queue to the storage medium to prevent having to > > > > merge/reconcile records in the queue). If they were all updating a > > > > single shared record, then I think you'd hit this pretty quick. > > > > > > Let's say though that your users are updating separate records...as > > > > your data size grows, you will probably see your update throughput > > > > decrease as other factors become dominant, and I believe this will > > > > primarily be dependent on the number and composition of the indexes > on > > > > your data as well as the number of entities persisted. To me, this > is > > > > the much riskier unknown because your average index structure is > > > > harder to update piecewise in parallel because the index must allow > > > > you to order/search all of the records' indexed columns. In an RDBMS > > > > like SQL Server or Oracle, you'd see some level of index locking take > > > > place during each transaction (maybe one page of an index) to allow > > > > concurrent updates to different sections of an index before the > > > > updates are committed, the transaction is ended and the locks are > > > > released. > > > > > > In relational persistence systems, this gets slower as the indexes > > > > become larger and is usually overcome with a technique like > > > > partitioning which, if you aren't familiar with it, sort of gives you > > > > a top level of your index tree where the data is actually spread into > > > > n groups of tables/indexes depending on some value in each record, > and > > > > you usually pick a partition key so that data volume in each > partition > > > > is kind of naturally balanced because rebalancing across partitions > is > > > > expensive. I'm not sure that any kind of similar mechanism has been > > > > exposed in the GAE datastore right now and so a single index declared > > > > for an entity type is probably realized as one big index. I would > > > > hope that there's sub-index granularity for locking during updates, > > > > but I'm actually guessing that's not the case for a couple of > reasons: > > > > > > 1) With most relational systems, you need to periodically rebuild the > > > > index or at least refresh the index statistics. I like to simplify > > > > this and think of rebuilding as rebalancing the data tree for optimal > > > > access speed while refreshing statistics typically just helps query > > > > optimizers decide whether use of an index should be preferred. On > the > > > > GAE though, they require you to have an index for each combination of > > > > query parameters, so I suspect that statistics don't come into play. > > > > And I haven't seen a "rebuild my indexes" function in the admin UI > > > > although admittedly I haven't looked for one too hard so I wonder if > > > > they aren't trying to keep the data tree somewhat well balanced > during > > > > each data update, which would require the entire index to potentially > > > > be locked. > > > > > > 2) I also haven't read anything yet about deadlock situations on GAE > > > > which can happen surprisingly easily if you're updating multiple > > > > indexes with enough concurrency and are using page locking. If you > > > > were designing the GAE datastore service, the way to avoid that > > > > situation would be to lock all indexes on each data update in the > same > > > > order every time. You'd sacrifice a lot of throughput, but you'd > > > > never hit a deadlock so I suspect they've done something like this > > > > behind the scenes unless people just aren't using GAE heavily enough > > > > yet or the good people of the GAE have used some special sauce in the > > > > datastore service impl. > > > > > > So I guess what I'm trying to say is that I don't believe that you > > > > should be satisfied with any particular bit of performance data from > > > > another application because your mileage will almost certainly vary. > > > > I think that If you really want to know how your application would > > > > perform and want to find out before writing the whole app and sharing > > > > it with a billion users, I would recommend a very empirical approach: > > > > > > I'd write a sample app with with entity group where entity widths and > > > > indexes are those that you think will be representative of your > > > > deployed application and then add a simple test harness that will: > > > > > > a) seed data to a point that you think is representative > > > > b) update and query your data in what you believe will be a worst > case > > > > scenario and then record the times > > > > > > I think the resulting curve of performance you see will be highly > > > > dependent on how you vary the seed data size and the number of > > > > indexes. Of course there are more dimensions than that, such as the > # > > > > of concurrent read operations and the # of concurrent write > > > > operations, that you can vary as well depending on what your > > > > performance requirements are. > > > > > > I hope this is somewhat helpful and I also hope that it's not totally > > > > incorrect and misleading since, as I said, it's all rampant > > > > speculation based on somewhat limited publicly available data. > > > > > > -Michael > > > > > > P.S. Of course, if anyone has data including # of records, #/ > > > > composisiton of indexes, # reads per hour, # writes per hour and > > > > latency per txn, I'd be fascinated to hear about it too! > > > > > > On Oct 19, 4:01 pm, Diana Cruise <diana.l.cru...@gmail.com> wrote: > > > > > > > This is exactly what I'm am talking about...in my case the User and > > > > > UserAddr are both in the same Entity Group. So, are you saying > that > > > > > my application which has a global presence in GAE can only support > 25 > > > > > simultaneous Users performing this update in under 5 seconds? > > > > > > > Again, I take 1-10 requests per second response and go with the avg > of > > > > > 5/s. Add up 25 Users simultaneously hitting this Entity Group and > > > > > that consumes a full 5 seconds. So, if you have 25 Users doing the > > > > > same update over and over they will each have about a 5 second > > > > > response. > > > > > > > I know I am wrong because this is way LOW for a Google platform or > any > > > > > other...I just am NOT hearing or seeing numbers that say otherwise. > > > > > > > If you clarify for me that this Entity Group performance stat of > 1-10/ > > > > > s is granular to the Row then we're on to something... That would > > > > > tell me that my scenario above only applies if ALL Users were > logged > > > > > into the same account!!! If the Entity Group performance stat is > > > > > granular to the Row then that would mean an infinite number of > Users > > > > > would average 5 updates per second. Please tell me this is TRUE! > > > > ... > > > > read more ยป- Hide quoted text - > > > > - Show quoted text - > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~----------~----~----~----~------~----~------~--~---