I'm glad to hear that the 1-10 requests/second is per User root
entity...in my case this means that huge number of Users logged in
around the world should expect sub-second response even if tens of
thousands clicked the Update button at the same instance in time!

The only problem is we do NOT hear from anyone outside of Google to
confirm performance of large volume for specific applications and what
the real costs are!!!

Regarding deadlock, I hear GAE does NOT both with lock timeouts so as
soon as a transaction trys to retrieve a record that is already
locked, it will receive an error and have to retry.



On Oct 19, 5:50 pm, "Dr. Flufenstein" <michael.brink...@gmail.com>
wrote:
> Preface: Please note, I'm not speaking for google at all in this note
> and a lot of what I've written is speculation based on what I've read
> in various GAE docs as well as some meager knowledge of how relational
> DBs generally work.  And yes, I know datastore isn't a relational DB,
> but I believe that their indexing implementation likely runs into many
> of the same problems you have with indexing relational data although
> that assumption could be completely wrong.
>
> From what I can tell, the update bottleneck you're referring to is for
> updating what you would often think of as a single record if you were
> persisting one instance of your User as a single denormalized record
> in a relational schema.  I suspect this bottleneck is due to the
> datastore architecture and the way that data updates are accumulated
> (possibly grouped/keyed by PK) in a queue, which is probably read from
> like a cache if read requests come in before the data has been flushed
> into the actual storage medium and replicated to the other
> datacenters.
>
> So if each of your users were updating their own User records, I don't
> believe you'd experience that limitation which may be an artifact of
> how those in-memory queue/cache structures are managed/locked during
> updates (i.e. a new update for a record may be held until it's been
> flushed from the queue to the storage medium to prevent having to
> merge/reconcile records in the queue).  If they were all updating a
> single shared record, then I think you'd hit this pretty quick.
>
> Let's say though that your users are updating separate records...as
> your data size grows, you will probably see your update throughput
> decrease as other factors become dominant, and I believe this will
> primarily be dependent on the number and composition of the indexes on
> your data as well as the number of entities persisted.  To me, this is
> the much riskier unknown because your average index structure is
> harder to update piecewise in parallel because the index must allow
> you to order/search all of the records' indexed columns.  In an RDBMS
> like SQL Server or Oracle, you'd see some level of index locking take
> place during each transaction (maybe one page of an index) to allow
> concurrent updates to different sections of an index before the
> updates are committed, the transaction is ended and the locks are
> released.
>
> In relational persistence systems, this gets slower as the indexes
> become larger and is usually overcome with a technique like
> partitioning which, if you aren't familiar with it, sort of gives you
> a top level of your index tree where the data is actually spread into
> n groups of tables/indexes depending on some value in each record, and
> you usually pick a partition key so that data volume in each partition
> is kind of naturally balanced because rebalancing across partitions is
> expensive.  I'm not sure that any kind of similar mechanism has been
> exposed in the GAE datastore right now and so a single index declared
> for an entity type is probably realized as one big index.  I would
> hope that there's sub-index granularity for locking during updates,
> but I'm actually guessing that's not the case for a couple of reasons:
>
> 1) With most relational systems, you need to periodically rebuild the
> index or at least refresh the index statistics.  I like to simplify
> this and think of rebuilding as rebalancing the data tree for optimal
> access speed while refreshing statistics typically just helps query
> optimizers decide whether use of an index should be preferred.  On the
> GAE though, they require you to have an index for each combination of
> query parameters, so I suspect that statistics don't come into play.
> And I haven't seen a "rebuild my indexes" function in the admin UI
> although admittedly I haven't looked for one too hard so I wonder if
> they aren't trying to keep the data tree somewhat well balanced during
> each data update, which would require the entire index to potentially
> be locked.
>
> 2) I also haven't read anything yet about deadlock situations on GAE
> which can happen surprisingly easily if you're updating multiple
> indexes with enough concurrency and are using page locking.  If you
> were designing the GAE datastore service, the way to avoid that
> situation would be to lock all indexes on each data update in the same
> order every time.  You'd sacrifice a lot of throughput, but you'd
> never hit a deadlock so I suspect they've done something like this
> behind the scenes unless people just aren't using GAE heavily enough
> yet or the good people of the GAE have used some special sauce in the
> datastore service impl.
>
> So I guess what I'm trying to say is that I don't believe that you
> should be satisfied with any particular bit of performance data from
> another application because your mileage will almost certainly vary.
> I think that If you really want to know how your application would
> perform and want to find out before writing the whole app and sharing
> it with a billion users, I would recommend a very empirical approach:
>
> I'd write a sample app with with entity group where entity widths and
> indexes are those that you think will be representative of your
> deployed application and then add a simple test harness that will:
>
> a) seed data to a point that you think is representative
> b) update and query your data in what you believe will be a worst case
> scenario and then record the times
>
> I think the resulting curve of performance you see will be highly
> dependent on how you vary the seed data size and the number of
> indexes.  Of course there are more dimensions than that, such as the #
> of concurrent read operations and the # of concurrent write
> operations, that you can vary as well depending on what your
> performance requirements are.
>
> I hope this is somewhat helpful and I also hope that it's not totally
> incorrect and misleading since, as I said, it's all rampant
> speculation based on somewhat limited publicly available data.
>
> -Michael
>
> P.S.  Of course, if anyone has data including # of records, #/
> composisiton of indexes, # reads per hour, # writes per hour and
> latency per txn, I'd be fascinated to hear about it too!
>
> On Oct 19, 4:01 pm, Diana Cruise <diana.l.cru...@gmail.com> wrote:
>
>
>
> > This is exactly what I'm am talking about...in my case the User and
> > UserAddr are both in the same Entity Group.  So, are you saying that
> > my application which has a global presence in GAE can only support 25
> > simultaneous Users performing this update in under 5 seconds?
>
> > Again, I take 1-10 requests per second response and go with the avg of
> > 5/s.  Add up 25 Users simultaneously hitting this Entity Group and
> > that consumes a full 5 seconds.  So, if you have 25 Users doing the
> > same update over and over they will each have about a 5 second
> > response.
>
> > I know I am wrong because this is way LOW for a Google platform or any
> > other...I just am NOT hearing or seeing numbers that say otherwise.
>
> > If you clarify for me that this Entity Group performance stat of 1-10/
> > s is granular to the Row then we're on to something...  That would
> > tell me that my scenario above only applies if ALL Users were logged
> > into the same account!!!  If the Entity Group performance stat is
> > granular to the Row then that would mean an infinite number of Users
> > would average 5 updates per second.  Please tell me this is TRUE!
>
> > Otherwise, if this Entity Group performance stat of 1-10/s is granular
> > to the whole group (is ALL rows) then the performance is dire as I
> > described originally.  Please tell me it isn't so!
>
> > On Oct 19, 11:10 am, Don Schwarz <schwa...@google.com> wrote:
>
> > > It's 1-10 updates per second per Entity 
> > > Group:http://code.google.com/appengine/docs/java/datastore/transactions.htm...
>
> > > You need to break your design up into Entity Groups according to which
> > > pieces will need to be updated in a single transaction.  In the best case,
> > > each entity can be its own entity group and the only restriction is that 
> > > you
> > > update each entity no more often than 1-10 times per second.
>
> > > For example, it would not be a good idea to store a global counter in one
> > > entity unless you planned to update it no more than 1-10 times per second.
> > >  The solution to this is to use sharded counters:
>
> > >http://code.google.com/appengine/articles/sharding_counters.html
>
> > > On Mon, Oct 19, 2009 at 11:06 AM, Diana Cruise 
> > > <diana.l.cru...@gmail.com>wrote:
>
> > > > Shawn, the 1-10s per Update was sited from Max Ross' I/O Video and
> > > > I've seen it in various talks/docs along the way...
>
> > > > On Oct 19, 11:03 am, Diana Cruise <diana.l.cru...@gmail.com> wrote:
> > > > > Shawn, the docs link you site is riddled with numbers (easy to get
> > > > > lost in them and what they truely mean)...which is why I included a
> > > > > simplest of scenarios above, that being to simply add a home
> > > > > addressbook entry attached to a User.  Surely someone has a sizeable
> > > > > production system today in GAE that could share load results and real
> > > > > costs.  If noone does, then that is also very troubling.
>
> > > > > Gaurav, I assume too that reading is NOT the problem and by this post
> > > > > am hoping to get real-world numbers to a simple update transaction.
> > > > > But, we need production app feedback from the most popular apps out
> > > > > there.  Is there such a list for Java for GAE yet?  Surely, there are
> > > > > large production apps by now?
>
> > > > > On Oct 19, 2:17 am, Gaurav <ano...@gmail.com> wrote:
>
> > > > > > GAE performs best for simultaneous read operations. So there could 
> > > > > > be
> > > > > > virtually any no.
> > > > > > of users reading at the same time, no issue. But when it comes to
> > > > > > making updates
> > > > > > performance degradation is significant.
> > > > > > To get a better understanding of how gae performs under heavy load, 
> > > > > > I
> > > > > > recommend
> > > > > > this video
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to