Hi Taylor,

In our application, we had to solve a similar problem. Basically, we
needed to answer the question: "what are my friends doing today?". Our
initial solution was (pseudo code):
events = []
for friend in friends:
    friend_events = friend.what_are_you_doing_today()
    events.extend(friend_events)

Obviously, this code is executed serially. So for a list of 200
friends, it was taking about 8 seconds to execute! unacceptable.

So to make it faster, we figured we need to execute this code in
*parallel*. Note that db.get(keys) and db.get_by_key_name(key_names)
are parallelized by the datastore ...

So first, we made a new entity that stores what a friend is doing on a
given day. Something like (I'm using your pseudo code to describe the
data structure):
EventsForDay{
    Key userKey;
    Key[] events;
    DateTime day;
}
This is updates whenever a user attends/unattends an event. I'm not
sure why you're looping through friends and writing stuff ....

The trick is we also gave each of those entities a key_name:
<user_key>_<YYYY:MM:DD>. Now, to get stuff quickly, we do:
1. Calculate all the key-names for the EventsForDay entities for the
current user's friends. Easy.
2. db.get_by_key_name(key_names)  <= this happens quite fast.
3. To kick things up a notch, use memcache in #2. For entities not
found in memcache, retrieve them from the datastore.

Now for a list of 200 friends, this executes in less than 400ms!

Cheers,
-Mahmoud


On Jul 27, 3:10 pm, Taylor <tgaut...@gmail.com> wrote:
> Hi,
>
> I have a facebook app that naturally would like to perform set
> operations on a logged in user's friends against the data in the
> application (stored in app engine).  This presents a particular
> challenge, suppose for example that in my application a logged in user
> can subscribe to a particular event, we then have some classes like
> this:
>
> User {
>    Long key;
>    Long facebookId;
>    // etc...
>
> }
>
> Event {
>   Long key;
>   String name;
>   // etc...
>
> }
>
> Subscribed {
>   Long userKey;
>   Long eventKey;
>   Long start;
>   Long end;
>   //etc...
>
> }
>
> Now, if I have a particular user logged in, who has a set of friends,
> I want to know which events that set of friends have subscribed to.
> A normal query would be "SELECT * from SUBSCRIBED where userkey in
> ( <the set of friends> )"
>
> By now, I know that there are no conditional OR operations, joins, or
> in queries allowed in app engine.
>
> My initial attempt to solve this problem has resulted in writing out
> the intersection at subscription time using an additional table,
> something like:
>
> FriendSubscribed extends Subscribed {
>   Long friendKey;
>   // etc...
>
> }
>
> // for every friend of the logged in user, write a record so the query
> can simply pick up
> // all records for that friend
> void persistForFriends(Collection<Long> friends, Subscribed
> subscribed) {
>   for (Long id : friends) {
>     pm.persist(new FriendSubscribed(id, subscribed));
>   }
>
> }
>
> Now I have shifted the query burden to the write side and I can do a
> simple query on the FriendSubscribed table for a particular user which
> is nice and fast: "SELECT * from FRIENDSUBSCRIBED where friendkey =
> <myuserid>".  This returns all the subscriptions that the logged in
> user's friends have made (with some slight gotchas, e.g. if the user
> adds a new friend, then any subscriptions that the new friend already
> has will not be seen - this is tolerable)
>
> So, what's the problem?  On facebook, a user typically has order 100
> friends.  And I suspect many will have 500 and a few will have 1000+.
>
> Even for order 100 friends, at the current write speed (5/s) I can
> expect a write operation to take approx. 20s!!  This is rather long
> for a web operation, and gets worse as you increase the number of
> friends.
>
> I am curious if anyone has any ideas for solutions?  I have a few
> thoughts, but wanted to see what people thought before moving on to
> the next optimization.  Here are my thoughts:
>
> 1) Queue the friend write operation - it's not critical to the core
> write operation that all the "join" data be written out, it can always
> be re-computed at any point in time by analyzing the Subscribed
> records and current friend list for each friend.  Only problem is I am
> using Java and the scheduled tasks API is not yet supported.
> Furthermore, it still seems like a lot of work to go through if we
> imagine thousands or hundreds of thousands of users making
> subscriptions as each write operation they do gets multiplied by a
> factor of 100-1000.
>
> 2) Make an owned relationship from Events to Subscribed. Something
> like:
>
> Event {
>   Long key;
>   String name;
>
>   Set<Subscribed> subscribed;
> // etc...
>
> }
>
> Now reading an event records gives precisely the information I want
> and simply has to be culled down (the intersection of the logged in
> user's friends and the Event subscribed list is the list of friends
> that are subscribed to that event).  The problem I see with this
> implementation is high contention.  For every subscription event I
> have to lock out updates and make sure everyone is writing a coherent
> view of the set (don't we have to update the value into the Event
> record?)  Or I may be wrong and it may be that the ownership
> relationship is maintained automatically by the datastore and this
> isn't as much of a concern as I think it is as the owned relationship
> doesn't really write anything into the owner record, it just does a
> join-like query to populate the set on read...that might be ideal??
>
> 3) ??  Something else?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to