Good post, David! Just one thing to add - there is in fact an existing library that already implements this: http://code.google.com/p/google-app-engine-ranklist/
-Nick Johnson On Thu, Nov 3, 2011 at 12:54 AM, David Whittaker <dpwhitta...@gmail.com>wrote: > I doubt BigTable keeps track of index positions... it seems from its > architecture (distributed sorted list of keys) that it would require > updating every entity in the table on every insert to keep track of > each entity's position. So, the question becomes: do you really need > this information, or can your application be restructured to use the > index key itself for whatever purpose you have in mind for the index > key's position? > > I can think of at least one example where you might really need this > information: a ranking system that ranks a huge list of entities based > on where they fall in the sorted list of some key (probably a > numerical "strength" value calculated from the rest of the entity's > properties). Let's say the entities represent Users on some > competitive flash gaming site, and JoeNewUser, who has been a member > of the site for 3 weeks and played 734 games of Tower Defense, wants > to know what his rank is. Joe's rank is 73,248, but we don't want to > have to look at the 73,247 people ahead of him (sorted by TotalScore > DESC) to find that out. If your actual use case has nothing in common > with this, let me know, but I think it at least parallels the general > gist of what you are asking for. > > What you are looking for is a B-Tree, with the addition of subtree > size in the subtree pointers: > http://en.wikipedia.org/wiki/B-tree > > The data model for that might look something like this: > class BTree(Model): > totalScores = ListProperty(int) #a sorted list totalScores, up to > some limit long > user = ListProperty(Key(User)) #same length as totalScores, points > to the User the score belongs to > pointers = ListProperty(Key(BTree)) #pointers to child Nodes > subTreeCount = ListProperty(int) #number of keys in all child > Nodes below this pointer > subTreeSum = ListProperty(int) #a prefix sum (running total) > of subTreeCount +1 for each totalScore > > So it's really just a bunch of list properties. You set how big you > want the list to get (bigger is better - fewer datastore calls for > lookups - but keep in mind the 1 MB limit). Follow the procedures on > the wiki page linked above, but on insert, add one to the subTreeCount > for the each pointer you follow (and 1 to every subTreeSum from there > to the right on each node). To find the index of an item you are > searching for, keep track of the subTreeSum of all the subtrees to the > left of the subtree you are following, plus the nodes you pass along > the way, and add them all up to get the index of the final node. > Don't forget to binary search the totalScores list - there's no need > to look at every item in it, since it's already sorted. You may have > to do some linear searching at the end if several users have the same > totalScore, but you can avoid that by using fixed point numbers and > appending the UserID to the key to ensure uniqueness while still > sorting correctly (lexicographically). Split operations are > relatively simple too, since you have all the data you need to > populate the new subTreeCounts and subTreeSums... it's just > bookkeeping. > > Looking at the complexity of this, if you have 100 keys per BTree > node, and a million entities, you can find the one you are looking for > and its index in 3 sequential datastore calls, while comparing to 8 > keys per node. 1000 keys per page gets you down to 2 datastore > lookups and 10 comparisons per page, but you start spending more and > more CPU time serializing and deserializing the data in and out of the > datastore, so you'll have to play around with it to find the > Goldilocks number. Good luck, and let me know if you need a more > detailed explanation (though it will probably take a few days). > > David > > > On Nov 1, 3:18 am, de Witte <jcreator.xi...@gmail.com> wrote: > > Hi, > > > > I'm looking for the following: > > > > Is it possible to retrieve the position of an entity in an index table? > > > > Of course, within going through the entire index by retrieving keys. > > > > Ayy tipes would be welcome > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appengine@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- Nick Johnson, Developer Programs Engineer, App Engine -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.