Hi Taylor, In our application, we had to solve a similar problem. Basically, we needed to answer the question: "what are my friends doing today?". Our initial solution was (pseudo code): events = [] for friend in friends: friend_events = friend.what_are_you_doing_today() events.extend(friend_events)
Obviously, this code is executed serially. So for a list of 200 friends, it was taking about 8 seconds to execute! unacceptable. So to make it faster, we figured we need to execute this code in *parallel*. Note that db.get(keys) and db.get_by_key_name(key_names) are parallelized by the datastore ... So first, we made a new entity that stores what a friend is doing on a given day. Something like (I'm using your pseudo code to describe the data structure): EventsForDay{ Key userKey; Key[] events; DateTime day; } This is updates whenever a user attends/unattends an event. I'm not sure why you're looping through friends and writing stuff .... The trick is we also gave each of those entities a key_name: <user_key>_<YYYY:MM:DD>. Now, to get stuff quickly, we do: 1. Calculate all the key-names for the EventsForDay entities for the current user's friends. Easy. 2. db.get_by_key_name(key_names) <= this happens quite fast. 3. To kick things up a notch, use memcache in #2. For entities not found in memcache, retrieve them from the datastore. Now for a list of 200 friends, this executes in less than 400ms! Cheers, -Mahmoud On Jul 27, 3:10 pm, Taylor <tgaut...@gmail.com> wrote: > Hi, > > I have a facebook app that naturally would like to perform set > operations on a logged in user's friends against the data in the > application (stored in app engine). This presents a particular > challenge, suppose for example that in my application a logged in user > can subscribe to a particular event, we then have some classes like > this: > > User { > Long key; > Long facebookId; > // etc... > > } > > Event { > Long key; > String name; > // etc... > > } > > Subscribed { > Long userKey; > Long eventKey; > Long start; > Long end; > //etc... > > } > > Now, if I have a particular user logged in, who has a set of friends, > I want to know which events that set of friends have subscribed to. > A normal query would be "SELECT * from SUBSCRIBED where userkey in > ( <the set of friends> )" > > By now, I know that there are no conditional OR operations, joins, or > in queries allowed in app engine. > > My initial attempt to solve this problem has resulted in writing out > the intersection at subscription time using an additional table, > something like: > > FriendSubscribed extends Subscribed { > Long friendKey; > // etc... > > } > > // for every friend of the logged in user, write a record so the query > can simply pick up > // all records for that friend > void persistForFriends(Collection<Long> friends, Subscribed > subscribed) { > for (Long id : friends) { > pm.persist(new FriendSubscribed(id, subscribed)); > } > > } > > Now I have shifted the query burden to the write side and I can do a > simple query on the FriendSubscribed table for a particular user which > is nice and fast: "SELECT * from FRIENDSUBSCRIBED where friendkey = > <myuserid>". This returns all the subscriptions that the logged in > user's friends have made (with some slight gotchas, e.g. if the user > adds a new friend, then any subscriptions that the new friend already > has will not be seen - this is tolerable) > > So, what's the problem? On facebook, a user typically has order 100 > friends. And I suspect many will have 500 and a few will have 1000+. > > Even for order 100 friends, at the current write speed (5/s) I can > expect a write operation to take approx. 20s!! This is rather long > for a web operation, and gets worse as you increase the number of > friends. > > I am curious if anyone has any ideas for solutions? I have a few > thoughts, but wanted to see what people thought before moving on to > the next optimization. Here are my thoughts: > > 1) Queue the friend write operation - it's not critical to the core > write operation that all the "join" data be written out, it can always > be re-computed at any point in time by analyzing the Subscribed > records and current friend list for each friend. Only problem is I am > using Java and the scheduled tasks API is not yet supported. > Furthermore, it still seems like a lot of work to go through if we > imagine thousands or hundreds of thousands of users making > subscriptions as each write operation they do gets multiplied by a > factor of 100-1000. > > 2) Make an owned relationship from Events to Subscribed. Something > like: > > Event { > Long key; > String name; > > Set<Subscribed> subscribed; > // etc... > > } > > Now reading an event records gives precisely the information I want > and simply has to be culled down (the intersection of the logged in > user's friends and the Event subscribed list is the list of friends > that are subscribed to that event). The problem I see with this > implementation is high contention. For every subscription event I > have to lock out updates and make sure everyone is writing a coherent > view of the set (don't we have to update the value into the Event > record?) Or I may be wrong and it may be that the ownership > relationship is maintained automatically by the datastore and this > isn't as much of a concern as I think it is as the owned relationship > doesn't really write anything into the owner record, it just does a > join-like query to populate the set on read...that might be ideal?? > > 3) ?? Something else? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---