Re: [google-appengine] Re: Help with an accurate counter with no contention...?

2009-11-28 Thread Eli Jones
Well, it sounds like you're wanting to have the aggregation task fired off
when a count event happens for a user.  So, then, as you mention, you'd need
a way to check if there wasn't already an aggregation task running.  And, in
the worst case scenario, you could have two tasks get fired off at once.. to
aggregate counts..  before the either of the tasks had a chance to set the
flag to indicate they were running.

You can give the task a name when you add it to the queue.. and once a task
exists in the Queue with given name.. you cannot insert a new task using
that same name.

So, the question becomes, how do you figure out what this Task Name should
be..?

A quick and dirty guess leads me to think you could do something like.

Task Name = AggregateBobCount_ + BobsCurrentTotalCount

Thus, you would ensure that no additional tasks were fired off until the
current AggregateBobCount_1208 task was done updating the count..

But, then .. as you mention, what about that window between the time that
the Aggregator updates the totalCount and flags,deletes the counted
counters?

If you lock it up with a transaction, will that effect the insertion of
other counts for Bob?  It might not.. and using a transaction along with
Task Names could solve this issue.

Another approach is to have the Task NOT be generally fired off anytime a
count event is inserted for a User.

I think having the Task be configured to be recursive might be the most
simple to manage.

At the beginning of the day, the initial Aggregator task runs (doing
aggregation for all users), once it is done, it adds a new Task to the Queue
with a Task Name like I mentioned above (this would cover the extremely
random chance that the Task ended up getting created twice somehow), and it
sets the delay to 60 seconds or whatever it is that you want.

So, the task is chained.. and a new one only runs once an old one is
finished.

The problem with this approach is... will you be wasting a lot of CPU time
by having tasks running for all Users trying to aggregate counts if the
users have no counts to aggregate?  That's something you'd just have to test
to see.. (were you to attempt this method).




On Sat, Nov 28, 2009 at 7:45 PM, peterk peter.ke...@gmail.com wrote:

 Hey Eli,

 Thanks very much for your replies.

 You're thinking along the same lines as me, although I wasn't
 considering using Expandos to store the data.

 My concern is sort of independent of this anyway - i'm worried that
 you can actually have more than one task aggregating changes to a
 counter running simultaneously.

 For example, an update is recorded for Bob's counter.

 How do we know if a task is already running to aggregate Bob's
 updates? If it's not we want to create one, but if there is already
 one we don't, because we want to try and avoid multiple tasks running
 simultaneously for one counter.

 So we could use  a flag to indicate if a task is already running. So
 before starting a task you could look to see if a flag is set. But
 there is a window there where two updates could see no flag, and
 create two tasks, and then create their flag. We could maybe use a
 transaction to get around this, but then I think (?) we have a point
 of contention for updates as if we were updating a single counter
 entity, and we're losing the benefits of all our other work.

 So then I was thinking we could move that flag to memcache. Which I
 think might solve our contention issue on the flag using add() etc.
 However there's then the possibility that the flag could be dropped by
 memcache prematurely. In that case, a second or third concurrent task
 for a given counter could be started up. But at least we won't be
 starting a task up for every update.

 I was thinking...maybe this isn't a problem to have a few tasks
 perhaps running concurrently for one counter if we put all our updates
 for a given counter into a single entity group. Then we could read the
 update, add it to the aggregation, and delete it in a transaction. So
 I think then, with a transaction with a delete in it, if another task
 concurrently tries to process that update, it'll fail. So our updates
 will only get hoovered up once by one task.

 I'm not entirely sure if this will be the case though. Will deletion
 of an entity in a transaction cause another task trying to do the same
 thing to fail? Obviously in this case we would want that behaviour so
 we could lock access to a given counter update to one task.

 On Nov 29, 12:19 am, Eli Jones eli.jo...@gmail.com wrote:
  To be fair, this method I described may be overkill.
 
  I developed it when I was thinking about how to lighten up insert costs
 to
  the datastore.
 
  I figured that, if one could store some of the relevant information in
 the
  Column name (specifically, string info like who's count is this?), that
  would reduce the total size of the entity.. and thus speed up writes.
 
  It was suggested that the performance wouldn't be much different than
 just
  

Re: [google-appengine] Re: Help with an accurate counter with no contention...?

2009-11-28 Thread Eli Jones
Though, let me re-iterate... all the round-about stuff I'm talking about is
something to consider if you don't want to try and modify the sharded
counter technique mentioned in this article:
http://code.google.com/appengine/articles/sharding_counters.html

http://code.google.com/appengine/articles/sharding_counters.htmlYou'd need
some db.Expando model that you used to replace the Memcached counting.  And
each user could have 20 counters in the model..

And each time a count event happened, you would increment a random counter
for that User.. and when you wanted to aggregate, you would get all the
counters for that user, add them up and then change the TotalCount to
whatever it was you got.

I think that might be a worthwhile approach too.

On Sat, Nov 28, 2009 at 8:20 PM, Eli Jones eli.jo...@gmail.com wrote:

 Well, it sounds like you're wanting to have the aggregation task fired off
 when a count event happens for a user.  So, then, as you mention, you'd need
 a way to check if there wasn't already an aggregation task running.  And, in
 the worst case scenario, you could have two tasks get fired off at once.. to
 aggregate counts..  before the either of the tasks had a chance to set the
 flag to indicate they were running.

 You can give the task a name when you add it to the queue.. and once a task
 exists in the Queue with given name.. you cannot insert a new task using
 that same name.

 So, the question becomes, how do you figure out what this Task Name should
 be..?

 A quick and dirty guess leads me to think you could do something like.

 Task Name = AggregateBobCount_ + BobsCurrentTotalCount

 Thus, you would ensure that no additional tasks were fired off until the
 current AggregateBobCount_1208 task was done updating the count..

 But, then .. as you mention, what about that window between the time that
 the Aggregator updates the totalCount and flags,deletes the counted
 counters?

 If you lock it up with a transaction, will that effect the insertion of
 other counts for Bob?  It might not.. and using a transaction along with
 Task Names could solve this issue.

 Another approach is to have the Task NOT be generally fired off anytime a
 count event is inserted for a User.

 I think having the Task be configured to be recursive might be the most
 simple to manage.

 At the beginning of the day, the initial Aggregator task runs (doing
 aggregation for all users), once it is done, it adds a new Task to the Queue
 with a Task Name like I mentioned above (this would cover the extremely
 random chance that the Task ended up getting created twice somehow), and it
 sets the delay to 60 seconds or whatever it is that you want.

 So, the task is chained.. and a new one only runs once an old one is
 finished.

 The problem with this approach is... will you be wasting a lot of CPU time
 by having tasks running for all Users trying to aggregate counts if the
 users have no counts to aggregate?  That's something you'd just have to test
 to see.. (were you to attempt this method).




 On Sat, Nov 28, 2009 at 7:45 PM, peterk peter.ke...@gmail.com wrote:

 Hey Eli,

 Thanks very much for your replies.

 You're thinking along the same lines as me, although I wasn't
 considering using Expandos to store the data.

 My concern is sort of independent of this anyway - i'm worried that
 you can actually have more than one task aggregating changes to a
 counter running simultaneously.

 For example, an update is recorded for Bob's counter.

 How do we know if a task is already running to aggregate Bob's
 updates? If it's not we want to create one, but if there is already
 one we don't, because we want to try and avoid multiple tasks running
 simultaneously for one counter.

 So we could use  a flag to indicate if a task is already running. So
 before starting a task you could look to see if a flag is set. But
 there is a window there where two updates could see no flag, and
 create two tasks, and then create their flag. We could maybe use a
 transaction to get around this, but then I think (?) we have a point
 of contention for updates as if we were updating a single counter
 entity, and we're losing the benefits of all our other work.

 So then I was thinking we could move that flag to memcache. Which I
 think might solve our contention issue on the flag using add() etc.
 However there's then the possibility that the flag could be dropped by
 memcache prematurely. In that case, a second or third concurrent task
 for a given counter could be started up. But at least we won't be
 starting a task up for every update.

 I was thinking...maybe this isn't a problem to have a few tasks
 perhaps running concurrently for one counter if we put all our updates
 for a given counter into a single entity group. Then we could read the
 update, add it to the aggregation, and delete it in a transaction. So
 I think then, with a transaction with a delete in it, if another task
 concurrently tries to process that update, it'll fail. So 

Re: [google-appengine] Re: Help with an accurate counter with no contention...?

2009-11-28 Thread Eli Jones
oh, and duh.. the first part of that article does the sharded counting..
using transactions without memcached.

So, presumably, it should do exactly what you want.

you just need to modify that to allow counting for an arbitrary number of
users.

On Sat, Nov 28, 2009 at 8:30 PM, Eli Jones eli.jo...@gmail.com wrote:

 Though, let me re-iterate... all the round-about stuff I'm talking about is
 something to consider if you don't want to try and modify the sharded
 counter technique mentioned in this article:
 http://code.google.com/appengine/articles/sharding_counters.html

 http://code.google.com/appengine/articles/sharding_counters.htmlYou'd
 need some db.Expando model that you used to replace the Memcached counting.
  And each user could have 20 counters in the model..

 And each time a count event happened, you would increment a random counter
 for that User.. and when you wanted to aggregate, you would get all the
 counters for that user, add them up and then change the TotalCount to
 whatever it was you got.

 I think that might be a worthwhile approach too.


 On Sat, Nov 28, 2009 at 8:20 PM, Eli Jones eli.jo...@gmail.com wrote:

 Well, it sounds like you're wanting to have the aggregation task fired off
 when a count event happens for a user.  So, then, as you mention, you'd need
 a way to check if there wasn't already an aggregation task running.  And, in
 the worst case scenario, you could have two tasks get fired off at once.. to
 aggregate counts..  before the either of the tasks had a chance to set the
 flag to indicate they were running.

 You can give the task a name when you add it to the queue.. and once a
 task exists in the Queue with given name.. you cannot insert a new task
 using that same name.

 So, the question becomes, how do you figure out what this Task Name should
 be..?

 A quick and dirty guess leads me to think you could do something like.

 Task Name = AggregateBobCount_ + BobsCurrentTotalCount

 Thus, you would ensure that no additional tasks were fired off until the
 current AggregateBobCount_1208 task was done updating the count..

 But, then .. as you mention, what about that window between the time that
 the Aggregator updates the totalCount and flags,deletes the counted
 counters?

 If you lock it up with a transaction, will that effect the insertion of
 other counts for Bob?  It might not.. and using a transaction along with
 Task Names could solve this issue.

 Another approach is to have the Task NOT be generally fired off anytime a
 count event is inserted for a User.

 I think having the Task be configured to be recursive might be the most
 simple to manage.

 At the beginning of the day, the initial Aggregator task runs (doing
 aggregation for all users), once it is done, it adds a new Task to the Queue
 with a Task Name like I mentioned above (this would cover the extremely
 random chance that the Task ended up getting created twice somehow), and it
 sets the delay to 60 seconds or whatever it is that you want.

 So, the task is chained.. and a new one only runs once an old one is
 finished.

 The problem with this approach is... will you be wasting a lot of CPU time
 by having tasks running for all Users trying to aggregate counts if the
 users have no counts to aggregate?  That's something you'd just have to test
 to see.. (were you to attempt this method).




 On Sat, Nov 28, 2009 at 7:45 PM, peterk peter.ke...@gmail.com wrote:

 Hey Eli,

 Thanks very much for your replies.

 You're thinking along the same lines as me, although I wasn't
 considering using Expandos to store the data.

 My concern is sort of independent of this anyway - i'm worried that
 you can actually have more than one task aggregating changes to a
 counter running simultaneously.

 For example, an update is recorded for Bob's counter.

 How do we know if a task is already running to aggregate Bob's
 updates? If it's not we want to create one, but if there is already
 one we don't, because we want to try and avoid multiple tasks running
 simultaneously for one counter.

 So we could use  a flag to indicate if a task is already running. So
 before starting a task you could look to see if a flag is set. But
 there is a window there where two updates could see no flag, and
 create two tasks, and then create their flag. We could maybe use a
 transaction to get around this, but then I think (?) we have a point
 of contention for updates as if we were updating a single counter
 entity, and we're losing the benefits of all our other work.

 So then I was thinking we could move that flag to memcache. Which I
 think might solve our contention issue on the flag using add() etc.
 However there's then the possibility that the flag could be dropped by
 memcache prematurely. In that case, a second or third concurrent task
 for a given counter could be started up. But at least we won't be
 starting a task up for every update.

 I was thinking...maybe this isn't a problem to have a few tasks
 perhaps