[google-appengine] Re: Help with an accurate counter with no contention...?

peterk Sun, 29 Nov 2009 14:42:06 -0800

I am slightly leaning toward going back to good old sharded
counters...

The reason I was looking for alternatives was because of some things
Brett Slatkin said in his Google IO presentation on task queues - that
really sharded counters were just a band-aid at the time, and that
they still have problems with time-outs.


But I guess as long as you keep enough shards for a given counter, it
shouldn't time out? We can presumably catch contention warnings and
within a number of retries create a new shard if necessary for an
update to write to so we never lose one.

I do agree with Brett that it's a bit heavy-handed, but if it
guarantees accuracy...

As far as I see it, the options seem to be this:

Write-Behind Cache:
+ Control over DB write rate - 100 counter updates can be one DB write
with this etc.
+ No redundancy in counter DB storage
+ Lower latency in the update request
- Busy counters could keep a task in the queue constantly. Lots of
busy counters (e.g. one for every user) = lots of task entries =
problem with daily limit and lagging db update due to 10 task/sec
invocation limit (?)
- Lots of busy counters = thrashing of the cache and higher potential
for a counter to disappear from cache before its saved (?)

Sharded Counters:
+ No impact on task queue quota
+ More reliable, no dependency on a cache
+ Can maybe scale better with larger number of counters since you're
not dealing with a limited cache size and task quota (?)
+ Can still deal with contention...
-...as long as you manage the number of shards. Contention (and thus a
dropped update) possible if number of shards is too low for a counter
- More storage space required (n entities for n shards vs one counter
entity)
- DB write on every update; more latency in the request
- Problem with reading counter total if number of shards goes beyond
1000? Thus limited to 1000 shards = ~5000 counter updates/sec? Maybe
could be overcome this during such busy times with a single
total_count entity that aggregates all shards every now and then via a
task (i.e. a task to read the shard values, up to n shards >1000, and
write the aggregate to a single total_count entity). Count total would
lag for a bit, but probably OK if it was being updated this much.

The examples we were thinking of where we write deltas to the
datastore would be more reliable than the memcache approach I think,
but would still suffer from the problem of using an awful lot of tasks
with many counters (and thus also being limited to updating ten
counters per second because of the 10 task/sec run limit on the
queues?). If you had a hundred thousand active counters, that means
potentially tends of thousands of counters with tasks in queues at a
given time, which means a counter could be waiting...maybe half an
hour or more between updates... (if you have 20000 counters being
updated within a given time - feasible with a large active site with
counters per user).

Any thoughts? Agreement, disagreement? I'm thinking sharded counters
with careful safeguards for concurrency exceptions may be the way to
go, but I'm wondering if there's any more monsters waiting down the
line with that choice...

On Nov 29, 1:37 am, Eli Jones <eli.jo...@gmail.com> wrote:
> oh, and duh.. the first part of that article does the sharded counting..
> using transactions without memcached.
>
> So, presumably, it should do exactly what you want.
>
> you just need to modify that to allow counting for an arbitrary number of
> users.
>
> On Sat, Nov 28, 2009 at 8:30 PM, Eli Jones <eli.jo...@gmail.com> wrote:
> > Though, let me re-iterate... all the round-about stuff I'm talking about is
> > something to consider if you don't want to try and modify the sharded
> > counter technique mentioned in this article:
> >http://code.google.com/appengine/articles/sharding_counters.html
>
> > <http://code.google.com/appengine/articles/sharding_counters.html>You'd
> > need some db.Expando model that you used to replace the Memcached counting.
> >  And each user could have 20 counters in the model..
>
> > And each time a count event happened, you would increment a random counter
> > for that User.. and when you wanted to aggregate, you would get all the
> > counters for that user, add them up and then change the TotalCount to
> > whatever it was you got.
>
> > I think that might be a worthwhile approach too.
>
> > On Sat, Nov 28, 2009 at 8:20 PM, Eli Jones <eli.jo...@gmail.com> wrote:
>
> >> Well, it sounds like you're wanting to have the aggregation task fired off
> >> when a count event happens for a user.  So, then, as you mention, you'd 
> >> need
> >> a way to check if there wasn't already an aggregation task running.  And, 
> >> in
> >> the worst case scenario, you could have two tasks get fired off at once.. 
> >> to
> >> aggregate counts..  before the either of the tasks had a chance to set the
> >> flag to indicate they were running.
>
> >> You can give the task a name when you add it to the queue.. and once a
> >> task exists in the Queue with given name.. you cannot insert a new task
> >> using that same name.
>
> >> So, the question becomes, how do you figure out what this Task Name should
> >> be..?
>
> >> A quick and dirty guess leads me to think you could do something like.
>
> >> Task Name = "AggregateBobCount_" + BobsCurrentTotalCount
>
> >> Thus, you would ensure that no additional tasks were fired off until the
> >> current "AggregateBobCount_1208" task was done updating the count..
>
> >> But, then .. as you mention, what about that window between the time that
> >> the Aggregator updates the totalCount and flags,deletes the counted
> >> counters?
>
> >> If you lock it up with a transaction, will that effect the insertion of
> >> other counts for Bob?  It might not.. and using a transaction along with
> >> Task Names could solve this issue.
>
> >> Another approach is to have the Task NOT be generally fired off anytime a
> >> count event is inserted for a User.
>
> >> I think having the Task be configured to be recursive might be the most
> >> simple to manage.
>
> >> At the beginning of the day, the initial Aggregator task runs (doing
> >> aggregation for all users), once it is done, it adds a new Task to the 
> >> Queue
> >> with a Task Name like I mentioned above (this would cover the extremely
> >> random chance that the Task ended up getting created twice somehow), and it
> >> sets the delay to 60 seconds or whatever it is that you want.
>
> >> So, the task is chained.. and a new one only runs once an old one is
> >> finished.
>
> >> The problem with this approach is... will you be wasting a lot of CPU time
> >> by having tasks running for all Users trying to aggregate counts if the
> >> users have no counts to aggregate?  That's something you'd just have to 
> >> test
> >> to see.. (were you to attempt this method).
>
> >> On Sat, Nov 28, 2009 at 7:45 PM, peterk <peter.ke...@gmail.com> wrote:
>
> >>> Hey Eli,
>
> >>> Thanks very much for your replies.
>
> >>> You're thinking along the same lines as me, although I wasn't
> >>> considering using Expandos to store the data.
>
> >>> My concern is sort of independent of this anyway - i'm worried that
> >>> you can actually have more than one task aggregating changes to a
> >>> counter running simultaneously.
>
> >>> For example, an update is recorded for Bob's counter.
>
> >>> How do we know if a task is already running to aggregate Bob's
> >>> updates? If it's not we want to create one, but if there is already
> >>> one we don't, because we want to try and avoid multiple tasks running
> >>> simultaneously for one counter.
>
> >>> So we could use  a flag to indicate if a task is already running. So
> >>> before starting a task you could look to see if a flag is set. But
> >>> there is a window there where two updates could see no flag, and
> >>> create two tasks, and then create their flag. We could maybe use a
> >>> transaction to get around this, but then I think (?) we have a point
> >>> of contention for updates as if we were updating a single counter
> >>> entity, and we're losing the benefits of all our other work.
>
> >>> So then I was thinking we could move that flag to memcache. Which I
> >>> think might solve our contention issue on the flag using add() etc.
> >>> However there's then the possibility that the flag could be dropped by
> >>> memcache prematurely. In that case, a second or third concurrent task
> >>> for a given counter could be started up. But at least we won't be
> >>> starting a task up for every update.
>
> >>> I was thinking...maybe this isn't a problem to have a few tasks
> >>> perhaps running concurrently for one counter if we put all our updates
> >>> for a given counter into a single entity group. Then we could read the
> >>> update, add it to the aggregation, and delete it in a transaction. So
> >>> I think then, with a transaction with a delete in it, if another task
> >>> concurrently tries to process that update, it'll fail. So our updates
> >>> will only get hoovered up once by one task.
>
> >>> I'm not entirely sure if this will be the case though. Will deletion
> >>> of an entity in a transaction cause another task trying to do the same
> >>> thing to fail? Obviously in this case we would want that behaviour so
> >>> we could lock access to a given counter update to one task.
>
> >>> On Nov 29, 12:19 am, Eli Jones <eli.jo...@gmail.com> wrote:
> >>> > To be fair, this method I described may be overkill.
>
> >>> > I developed it when I was thinking about how to lighten up insert costs
> >>> to
> >>> > the datastore.
>
> >>> > I figured that, if one could store some of the relevant information in
> >>> the
> >>> > Column name (specifically, string info like "who's count is this?"),
> >>> that
> >>> > would reduce the total size of the entity.. and thus speed up writes.
>
> >>> > It was suggested that the performance wouldn't be much different than
> >>> just
> >>> > having models like this:
>
> >>> > class UserCounter(db.Model):
> >>> >   Username = db.StringProperty(required = True)
>
> >>> > class UserTotalCount(db.Model):
> >>> >   Username = db.StringProperty(required = True)
> >>> >   Count = db.IntegerProperty(required = True)
>
> >>> > Then, you'd just
> >>> > Select __key__ from UserCounter Where Username = 'Bob'
> >>> > and
> >>> > Select * from UserTotalCount Where Username = 'Bob'
>
> >>> > To do your counting and updating..
>
> >>> > Though, my intuition is that doing it this way would take more
> >>> processing
> >>> > power (and maybe lead to some contention) since you're inserting
> >>> > StringProperties into one big column when putting UserCounter events.
>
> >>> > Here is the initial thread covering what I was trying to figure out:
> >>> > Expando and Index
> >>> > Partitioning<
> >>>http://groups.google.com/group/google-appengine/browse_thread/thread/..
> >>> .>
>
> >>> > On Sat, Nov 28, 2009 at 6:46 PM, Eli Jones <eli.jo...@gmail.com>
> >>> wrote:
> >>> > > I think there would have to be some divergence between what the
> >>> counter
> >>> > > should be and what the user will actually see at any given time..
> >>> since if
> >>> > > you have a high rate of counts happening for a user.. you'd get
> >>> contention
> >>> > > when trying to update all the count each time the event being counted
> >>> > > happened.
>
> >>> > > Of course, you know that part.. since they have all these sharding
> >>> > > examples.
>
> >>> > > So, you gotta decide how stale the count can be...
>
> >>> > > Once you decide that, since you don't seem to want any potential loss
> >>> of
> >>> > > counts... you'd probably need two Models to do counting for each
> >>> user.
> >>> > > (memcache would be out since that allows potential lost counts)
>
> >>> > > One for each individual count inserted (call it UserCounter) and one
> >>> for
> >>> > > the count that the user sees (UserTotalCount).
>
> >>> > > So, if a count-event happens you insert a new row into UserCounter.
>
> >>> > > Then you should have a task that runs that selects __key__ from
> >>> > > UserCounter, finds out how many entities were returned, updates the
> >>> > > UserTotalCount model with the additional counts, and once that update
> >>> is
> >>> > > successful, it deletes the keys, entities for those counts that it
> >>> selected.
> >>> > >  AND then, once all of that is successful, have the Task essentially
> >>> > > schedule itself to run again in N seconds or however long you've
> >>> decided to
> >>> > > give it.
>
> >>> > > Presumably, doing it this way would allow you to make sure that the
> >>> > > counterupdate task is running one at a time for each user (since it
> >>> can only
> >>> > > start itself up again after it is done counting).. and you would
> >>> avoid write
> >>> > > contention since the task is the only thing updating the user's
> >>> counter.
>
> >>> > > Probably, you could set up two Expando models to do this for all
> >>> users..
> >>> > > and each time a new user was created, you'd add a new Column to the
> >>> Expando
> >>> > > models for that user.
>
> >>> > > so, you'd have these initial definitions:
>
> >>> > > class UserCounter(db.Expando):
> >>> > >     BobCountEvent = db.BooleanProperty(required=True)
>
> >>> > > class
>
> ...
>
> read more »

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: Help with an accurate counter with no contention...?

Reply via email to