Good post, David! Just one thing to add - there is in fact an existing
library that already implements this:
http://code.google.com/p/google-app-engine-ranklist/

-Nick Johnson


On Thu, Nov 3, 2011 at 12:54 AM, David Whittaker <dpwhitta...@gmail.com>wrote:

> I doubt BigTable keeps track of index positions... it seems from its
> architecture (distributed sorted list of keys) that it would require
> updating every entity in the table on every insert to keep track of
> each entity's position.  So, the question becomes: do you really need
> this information, or can your application be restructured to use the
> index key itself for whatever purpose you have in mind for the index
> key's position?
>
> I can think of at least one example where you might really need this
> information: a ranking system that ranks a huge list of entities based
> on where they fall in the sorted list of some key (probably a
> numerical "strength" value calculated from the rest of the entity's
> properties).  Let's say the entities represent Users on some
> competitive flash gaming site, and JoeNewUser, who has been a member
> of the site for 3 weeks and played 734 games of Tower Defense, wants
> to know what his rank is.  Joe's rank is 73,248, but we don't want to
> have to look at the 73,247 people ahead of him (sorted by TotalScore
> DESC) to find that out.  If your actual use case has nothing in common
> with this, let me know, but I think it at least parallels the general
> gist of what you are asking for.
>
> What you are looking for is a B-Tree, with the addition of subtree
> size in the subtree pointers:
> http://en.wikipedia.org/wiki/B-tree
>
> The data model for that might look something like this:
> class BTree(Model):
>  totalScores = ListProperty(int) #a sorted list totalScores, up to
> some limit long
>  user = ListProperty(Key(User))  #same length as totalScores, points
> to the User the score belongs to
>  pointers = ListProperty(Key(BTree))  #pointers to child Nodes
>  subTreeCount = ListProperty(int)     #number of keys in all child
> Nodes below this pointer
>  subTreeSum = ListProperty(int)       #a prefix sum (running total)
> of subTreeCount +1 for each totalScore
>
> So it's really just a bunch of list properties.  You set how big you
> want the list to get (bigger is better - fewer datastore calls for
> lookups - but keep in mind the 1 MB limit).  Follow the procedures on
> the wiki page linked above, but on insert, add one to the subTreeCount
> for the each pointer you follow (and 1 to every subTreeSum from there
> to the right on each node).  To find the index of an item you are
> searching for, keep track of the subTreeSum of all the subtrees to the
> left of the subtree you are following, plus the nodes you pass along
> the way, and add them all up to get the index of the final node.
> Don't forget to binary search the totalScores list - there's no need
> to look at every item in it, since it's already sorted.  You may have
> to do some linear searching at the end if several users have the same
> totalScore, but you can avoid that by using fixed point numbers and
> appending the UserID to the key to ensure uniqueness while still
> sorting correctly (lexicographically).  Split operations are
> relatively simple too, since you have all the data you need to
> populate the new subTreeCounts and subTreeSums... it's just
> bookkeeping.
>
> Looking at the complexity of this, if you have 100 keys per BTree
> node, and a million entities, you can find the one you are looking for
> and its index in 3 sequential datastore calls, while comparing to 8
> keys per node.  1000 keys per page gets you down to 2 datastore
> lookups and 10 comparisons per page, but you start spending more and
> more CPU time serializing and deserializing the data in and out of the
> datastore, so you'll have to play around with it to find the
> Goldilocks number.  Good luck, and let me know if you need a more
> detailed explanation (though it will probably take a few days).
>
> David
>
>
> On Nov 1, 3:18 am, de Witte <jcreator.xi...@gmail.com> wrote:
> > Hi,
> >
> > I'm looking for the following:
> >
> > Is it possible to retrieve the position of an entity in an index table?
> >
> > Of course, within going through the entire index by retrieving keys.
> >
> > Ayy tipes would be welcome
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to