[google-appengine] Re: counting

Peter Recore Mon, 27 Oct 2008 13:14:50 -0700

I started playing around with a solution this weekend, and trying to
make it safe for concurrent users but still fast is definitely the
tricky part.  My goal is to limit the need for transactions to the
lower parts of the tree in the majority of cases.



On Oct 27, 2:38 pm, Ben Nevile <[EMAIL PROTECTED]> wrote:
> Hi Peter,
>
> Thanks first for re-stating the problem with more precise language!
>
> As for your solution, sharding into many different ranking groups is
> an interesting approach.  One could imagine maintaining a tree of
> ranking groups; searching/inserting/querying would require log N reads/
> writes.  All of it would have to be wrapped up in transactions, which
> makes me a bit queasy. Not sure I want to venture down that path when
> a cheap server running MySQL can handle the problem so nicely.
>
> I have also been evaluating CouchDB as a next-gen database solution.
> Hopefully Google is tracking their progress as well - as the world
> leaders in map-reduce implementations, one would expect datastore to
> boast features similar to how CouchDB handles its views.  To do a
> histogram in CouchDB,
>
> map: function(doc) { emit(score, 1); }
> reduce: function(keys, values) { return sum(values) }
>
> CouchDB efficiently diffs this view index with every incoming write,
> so finding a rank is a quick query.
>
> Ben
>
> On Oct 25, 5:27 pm, Peter Recore <[EMAIL PROTECTED]> wrote:
>
> > I'll try to state a more formal version of the problem, then offer my
> > suggestion.
>
> > Problem:
>
> > Assume that there is an entity type called Player, and each Player has
> > a score.  A score is an integer.  For any given Player P with a score
> > of X, we want to know how many other Players have scores that are
> > greater than or equal to X.  This is their Rank.
>
> > Let's say we have the following Players and scores:
>
> > Peter 30
> > Ben  27
> > Sylvain 42
> > Marzia 50
> > Ryan 12
>
> > Peter's Rank would be 3 because Marzia, Sylvain and Peter all have
> > scores greater than or equal to Peter's score of 30.
> > Marzia's Rank would be 1, and Ryan's would be 5.  The tricky part is
> > that Ben needs to do this efficiently for hundreds of thousands of
> > items ( with a fairly frequent number of updates i assume)
>
> > Getting this kind of answer is relatively simple on a SQL database,
> > though I don't honestly know how scalable the typical count(*) > X
> > solutions are.  With the proper indexes, they could probably be very
> > fast, but i don't know if you'd get lots of contention on writing to
> > that table/index.
>
> > This does seem like a problem that any game with the concept of 'high
> > score' built in app engine would need to solve. (actually, this
> > problem probably generalizes even further)  I'm pretty sure I will
> > need something like this at some point myself.  So I might as well
> > take a stab at it now. Here are my initial thoughts...  Encouraging or
> > disparaging comments welcome.
>
> > To solve this problem with the GAE datastore, I propose building a
> > ShardedRanker.
> > The lowest level item we need to consider is a RankedItem.  A
> > RankedItem has a key, a value, and a reference to a RankingGroup.  The
> > key would be a player's id, and the value would be a player's score.
>
> > A ShardedRanker would have references to a number of RankingGroups.
> > Each RankingGroup would contain a list of RankedItems, and be
> > responsible for knowing the relative rank of each item it contains.
> > Each RankingGroup would also be responsible for knowing the highest
> > and lowest scores its items contain.
>
> > When we want to know the rank of a particular item, we can get its
> > overall rank by combining its relative rank within its RankingGroup
> > with the total number of items contained in higher ranked Groups.
> > This will require 1 datastore read to get the info from the
> > RankingGroup, plus one read for each of the RankingGroups ranked
> > higher than the 'target' group.
>
> > When we insert a new RankedItem, we will need to update the
> > RankingGroup it belongs to, and maybe the index which tells us which
> > Group contains which ranges of scores.  At some point, if a
> > RankingGroup gets to big, it would probably be split.
>
> > There should be lots of opportunity to memcache things here, reducing
> > the need for datastore reads.  As the total number of items increases,
> > it might be necessary to create multiple levels of RankingGroups.
>
> > -peter
>
> > On Oct 24, 5:49 pm, yejun <[EMAIL PROTECTED]> wrote:
>
> > > His problem is a count needed for every single user. So a single
> > > change will cause a massive update on all count.
> > > On a traditional database it is the position on the index.
>
> > > On Oct 24, 5:33 pm, Sylvain <[EMAIL PROTECTED]> wrote:
>
> > > > Sorry, I don't understand exactly the problem but if you want to have
> > > > this SQL
>
> > > > select count(*) from entries where points > 100
>
> > > > You can create many "sharded counter"
> > > > and from the code use this :
>
> > > > increment("point_%i" % point)
>
> > > > Then, if you want to count, just : get_count("point_%i" % point)
>
> > > > It will create a lot of of "counter" but it will works fine.
>
> > > > Then if you want to count for points > 100
> > > > You will have to do many any times : get_count("point_%i" % point) for
> > > > point from x to y
> > > > else you can create steps : a counter from 100 to 150 , etc,...
>
> > > > I think the only solution to do count(*) (if count(*) > 1000) is to
> > > > use sharded counter
>
> > > > if count(*) < 1000 then you can use the "count" keyword.
>
> > > > Regards.
>
> > > > On 24 oct, 21:39, Ben Nevile <[EMAIL PROTECTED]> wrote:
>
> > > > > Hi Sylvain - thanks for the pointer, and although I now know a cool
> > > > > sharding hack for a shared resource, I don't see how the technique can
> > > > > be applied to my problem.  To summarize, I could have a model indexed
> > > > > by a single entity called "points", and I want to be able to tell
> > > > > someone how they rank in terms of points.  For someone with say 100
> > > > > points, in SQL I would do something like
>
> > > > > select count(*) from entries where points > 100
>
> > > > > Ben
>
> > > > > On Oct 24, 9:51 am, Sylvain <[EMAIL PROTECTED]> wrote:
>
> > > > > > Check this 
> > > > > > :http://groups.google.com/group/google-appengine/browse_frm/thread/e13...
>
> > > > > > and code here :http://paste.blixt.org/1581
>
> > > > > > On 24 oct, 08:52, Ben Nevile <[EMAIL PROTECTED]> wrote:
>
> > > > > > > Hi,
>
> > > > > > > I have recently started using GAE for components of a large 
> > > > > > > sports-
> > > > > > > related Facebook app.  One of the contests has hundreds of 
> > > > > > > thousands
> > > > > > > of participants, and I need to be able to tell a user at any given
> > > > > > > time what place they're in.  eg, you are 34,728th out of 234,829
> > > > > > > participants.
>
> > > > > > > After spending some time with the docs and browsing this group, it
> > > > > > > appears that using the datastore there's no way to accomplish this
> > > > > > > relatively mundane task.  So do I have to keep a mysql database 
> > > > > > > active
> > > > > > > on some other host just so that I can efficiently do this type of
> > > > > > > analysis?
>
> > > > > > > Ben
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: counting

Reply via email to