Hey Eli,

Thanks very much for your replies.

You're thinking along the same lines as me, although I wasn't
considering using Expandos to store the data.

My concern is sort of independent of this anyway - i'm worried that
you can actually have more than one task aggregating changes to a
counter running simultaneously.

For example, an update is recorded for Bob's counter.

How do we know if a task is already running to aggregate Bob's
updates? If it's not we want to create one, but if there is already
one we don't, because we want to try and avoid multiple tasks running
simultaneously for one counter.

So we could use  a flag to indicate if a task is already running. So
before starting a task you could look to see if a flag is set. But
there is a window there where two updates could see no flag, and
create two tasks, and then create their flag. We could maybe use a
transaction to get around this, but then I think (?) we have a point
of contention for updates as if we were updating a single counter
entity, and we're losing the benefits of all our other work.

So then I was thinking we could move that flag to memcache. Which I
think might solve our contention issue on the flag using add() etc.
However there's then the possibility that the flag could be dropped by
memcache prematurely. In that case, a second or third concurrent task
for a given counter could be started up. But at least we won't be
starting a task up for every update.

I was thinking...maybe this isn't a problem to have a few tasks
perhaps running concurrently for one counter if we put all our updates
for a given counter into a single entity group. Then we could read the
update, add it to the aggregation, and delete it in a transaction. So
I think then, with a transaction with a delete in it, if another task
concurrently tries to process that update, it'll fail. So our updates
will only get hoovered up once by one task.

I'm not entirely sure if this will be the case though. Will deletion
of an entity in a transaction cause another task trying to do the same
thing to fail? Obviously in this case we would want that behaviour so
we could lock access to a given counter update to one task.

On Nov 29, 12:19 am, Eli Jones <eli.jo...@gmail.com> wrote:
> To be fair, this method I described may be overkill.
>
> I developed it when I was thinking about how to lighten up insert costs to
> the datastore.
>
> I figured that, if one could store some of the relevant information in the
> Column name (specifically, string info like "who's count is this?"), that
> would reduce the total size of the entity.. and thus speed up writes.
>
> It was suggested that the performance wouldn't be much different than just
> having models like this:
>
> class UserCounter(db.Model):
>   Username = db.StringProperty(required = True)
>
> class UserTotalCount(db.Model):
>   Username = db.StringProperty(required = True)
>   Count = db.IntegerProperty(required = True)
>
> Then, you'd just
> Select __key__ from UserCounter Where Username = 'Bob'
> and
> Select * from UserTotalCount Where Username = 'Bob'
>
> To do your counting and updating..
>
> Though, my intuition is that doing it this way would take more processing
> power (and maybe lead to some contention) since you're inserting
> StringProperties into one big column when putting UserCounter events.
>
> Here is the initial thread covering what I was trying to figure out:
> Expando and Index
> Partitioning<http://groups.google.com/group/google-appengine/browse_thread/thread/...>
>
> On Sat, Nov 28, 2009 at 6:46 PM, Eli Jones <eli.jo...@gmail.com> wrote:
> > I think there would have to be some divergence between what the counter
> > should be and what the user will actually see at any given time.. since if
> > you have a high rate of counts happening for a user.. you'd get contention
> > when trying to update all the count each time the event being counted
> > happened.
>
> > Of course, you know that part.. since they have all these sharding
> > examples.
>
> > So, you gotta decide how stale the count can be...
>
> > Once you decide that, since you don't seem to want any potential loss of
> > counts... you'd probably need two Models to do counting for each user.
> > (memcache would be out since that allows potential lost counts)
>
> > One for each individual count inserted (call it UserCounter) and one for
> > the count that the user sees (UserTotalCount).
>
> > So, if a count-event happens you insert a new row into UserCounter.
>
> > Then you should have a task that runs that selects __key__ from
> > UserCounter, finds out how many entities were returned, updates the
> > UserTotalCount model with the additional counts, and once that update is
> > successful, it deletes the keys, entities for those counts that it selected.
> >  AND then, once all of that is successful, have the Task essentially
> > schedule itself to run again in N seconds or however long you've decided to
> > give it.
>
> > Presumably, doing it this way would allow you to make sure that the
> > counterupdate task is running one at a time for each user (since it can only
> > start itself up again after it is done counting).. and you would avoid write
> > contention since the task is the only thing updating the user's counter.
>
> > Probably, you could set up two Expando models to do this for all users..
> > and each time a new user was created, you'd add a new Column to the Expando
> > models for that user.
>
> > so, you'd have these initial definitions:
>
> > class UserCounter(db.Expando):
> >     BobCountEvent = db.BooleanProperty(required=True)
>
> > class UserTotalCount(db.Expando):
> >     BobTotalCount = db.IntegerProperty(required=True)
>
> > Then, each time user Bob has a count event you do:
>
> > bobCount = UserCounter(BobCountEvent = True)
> > bobCount.put()
>
> > And when you want to update Bob's Total Count, you do (I have to do this
> > quasi-pseudo since it isn't trivial to do):
>
> > results = Select __key__ from UserCounter Where BobCountEvent = True
> > If len(results) > 0:
> >   countResult = Select * from UserTotalCount Where BobTotalCount >= 0
> >   if len(countResult) > 0:
> >     countResult.BobTotalCount += len(results)
> >     db.put(countResults)
> >   else:
> >     newCount = UserTotalCount(BobTotalCount = len(results))
> >     newCount.put()
> >   db.delete(results)
>
> > Now, you might wonder... how do I do puts for variable user names? (You
> > can' t just create new put functions for each new user)..  In Python, you
> > can use exec to do that..
>
> > I have not tested how any of this performs... having an expando model may
> > hurt performance.. but, I don't think so, and I know the method works for
> > other things (not sure how it'd do on this counter method).
>
> > See here for Google's sharded counts example:
> >http://code.google.com/appengine/articles/sharding_counters.html
>
> > On Sat, Nov 28, 2009 at 5:17 PM, peterk <peter.ke...@gmail.com> wrote:
>
> >> Hey all,
>
> >> I've been looking at the Task Queue API and counter example. In my
> >> app, each user will have a couple of counters maintained for them,
> >> counting various things.
>
> >> Thing is, these counters need to be accurate. So I'm not sure if the
> >> example given for the Task Queue API using memcache would be
> >> appropriate for me - it would not be good, really, if my counters were
> >> to be inaccurate. My users would expect accurate counts here.
>
> >> So I was thinking about a sort of modified version whereby each change
> >> to the counter would be stored in the DS in its own entity. E.g. an
> >> entity called 'counter_delta' or some such, which holds the delta to
> >> apply to a counter, and the key to the counter that the delta is to be
> >> applied to.
>
> >> Then, using the Task Queue I guess I could hoover up all these delta
> >> entities, aggregate them, and apply them in one go (or in batches) to
> >> the counter. And then delete the delta entries.
>
> >> Thus the task queue is the only thing accessing the counter entity,
> >> and it does so in a controllable fashion - so no real contention. Each
> >> change to the counter gets written to the store in its own
> >> counter_delta entity, so no contention there either. And because the
> >> deltas are stored in DS and not in memcache, it should be much more
> >> reliable.
>
> >> However, I'm not entirely sure how I should actually go about
> >> implementing this, or specifically, the task queue end of things.
>
> >> I'm thinking if there is a change to a counter to be made, I should
> >> check if there's a task already running for this counter, and if so,
> >> not to do insert any new task, and let the currently running task take
> >> care of it. If there is no running task for this counter, I would
> >> instead create one, and set it to run in - say - 60 seconds, allowing
> >> time for further deltas for this counter to accumulate so the task can
> >> take care of more than just one delta. This would mean the counter
> >> might be inaccurate for up to 60 seconds, but I can live with that.
>
> >> But what I'm wondering is, how can I implement this 'don't insert a
> >> new task if one for this counter is already in the queue or running'
> >> behaviour?
>
> >> I was thinking initially that I could give the task a name based on
> >> the counter, so that only one such task can exist at any one time.
> >> However, I believe we have no control over when that name is freed up
> >> - it isn't necessarily freed up when the task ends, I believe names
> >> can be reserved for up to 7 days (?) So that wouldn't work. If a name
> >> could be freed up when a task was really finished then this could
> >> work, I think.
>
> >> I was thinking also I could store a flag so that when a counter_delta
> >> is created, I'd look to see if a flag for this counter was present,
> >> and if so, do nothing. If not, create the task, and create the flag.
> >> Then when the task was all done and didn't see any more
> >> counter_deltas, it'd delete the flag. But I'm worried that there could
> >> be race conditions here, and some deltas might get overlooked as a
> >> result? And if I were to use transactions on such a flag, would I not
> >> fall into the same contention trap I'm trying to avoid in the first
> >> place?
>
> >> Help? :| Thanks for any advice/insight...
>
> >> --
>
> >> You received this message because you are subscribed to the Google Groups
> >> "Google App Engine" group.
> >> To post to this group, send email to google-appeng...@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com>
> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/google-appengine?hl=en.

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.


Reply via email to