Mark - im working on a similar system myself. I agree that using a string in the user class is potentially going to throw issues on a large scale - app as it would need sharding? also running into 10,000 of followers could get heavy. I have made an implementation using a simple relationship model. and it can be used for anything. Friendships, followers, related images anything you like. I am not 100% sure it is the best model or even built correctly but i was considering posting it for comments. Its still in working so the code is not 100% tested ect - but it may give you some ideas. Any comments on this are much appreciated - as i have said i have looked at list properties but i can only imagine they would need sharding etc etc. I have not added any mcache support but this will be added to the finish class - i will post my code if anyone is interested.
!/usr/bin/env python # # Copyright 2007 Google Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # from google.appengine.ext import db import google.appengine.ext.db from google.appengine.api import memcache import shards class collection(db.Model): """ Stores relationships - This could be achieved using a ListProperty(db.ReferenceProperty) but that may run into issues as it is possible that many "sessions" could be appending keys at the same time. Also a list property may have a performance issue if their are millions of siblings following for one entity. further-more we can mchache our keys for even faster performance when we check if keys are siblings of parent urls. I feel that this approach simple it may be - may be a better approach for long term performance? The collection object stores a domain, url and a key domain, url, key Where the domain and url describe the object and the key is the related sibling. domian='friend' url='martin' key=some instance key long_url = 'friend/martin' stored_key ='friend/martin/[some key]' We can store friendships by storing one side of the friendship as the url and the siblings (friends) as keys. the query to test if a sibling is a friend of a parent - is VERY simple: stored_key=make_key(domian, url,key) exists = collection.get_by_key_name(stored_key) This works as when we store a key the we store the key name as the domian+url+key; this is done using the make_key function We can memcache this - again using the stored_key as the cache key for super quick requests - when we are using the collection for say friendships user pages will need to know if the current user is a friend of the page owner - mcache will respond super fast in this scenario. Counters. We use our shard class for keeping count of how many instances are related to the url (ie how many friends to an entity) counter_key=long_url+'/cnt' shards.increment(counter_key) (note the format of our key ) Getting a list of all the keys that are a sibling: (so all the friends of an entity) long_url=make_url('friends','martin') # 'friends/martin' all_siblings = db.GqlQuery('SELECT * FROM collection WHERE url = :1', long_url).get(limit) #paging removed We use a simple query that filters the url and returns the keys. As the keys are strings we don't need to worry about the data-store loading 'related instances'. Once we get our keys - we can create a short list of keys [] that can then make one call to the datastore and load the instances by key. This in theory should be super-quick as only the instances say 10 at a time are loaded for for queries where potentialy thousands of siblings may be present #Make a list of keys so that we can load our instances in a flash - in real world this might be 10 at a time (see limit in above query) li=[] for instance in keys: li.append(instance.key) #get a list of instances for the keys return db.get(li) The next question we could ask the datastore is: who am i a sibling of (who am i friends to) This can be returned using another simple query; all_instances = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND domain = ;2', sibling_key, 'friends').get(limit) note we use 'friends' as our domain as what to only return keys where the relationship is of a friendship Another example of use Lets say i want to create a group of people this can be model'd by creating a model for Groups: Class Groups(db.Model) Name: db.StringProperty() Now we can do: martin=user("martin") leo=user("leo") group=Group(key_name="Club1", Name="Club1") links.collection.add_sibling('groups',Club.Key(),martin) links.collection.add_sibling('groups',Club.Key(),leo) Note in this example we identify that the relationship is a group by using 'group' as our domain. now we can use the queries above to see "who" is in the group and if a person is a member of a group and what groups say martin is a member of; all_groups = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND domain = ;2', martin.key() , 'groups').get(limit) the above lists all the 'groups' martin is a member of. Friendships: martin=user("martin") #is the initiator leo=user("leo") #is the acceptor #adds the relationship links.collection.add_sibling('friends',leo.key(),martin.key()) #if the relaitionship is too sided we simply reverse it links.collection.add_sibling('friends',martin.key(),leo.key()) #this may not be the best solution - we could simply create a query to find reversed relationships? the above queries can be used to get the relationships This class is useful in social apps where many relationships can be made. TO DO: 1. test 2. add mcache 3. add other in built queries as detailed in header outline (list siblings, who am i a sibling of, am i a sibling) 4. add remove relationship """ """ THE MODEL OBJECT domain, url, key the domain of the instance example 'freinds' """ domain=db.StringProperty() """ a url to describe our parent like friends/martin we make the url by using; url=make_key(domain,url,sibling) """ url = db.StringProperty() """the key of our sibling instance""" sibling = db.StringProperty() @staticmethod def add_sibling(domain='',url=None, key=None): """ Add a sibling key to a url's collection. the key should be like numberfile.key() the url should be a long url path like martin/friends but not include the key """ """ make the unique stored key which is the complete path long_url/key """ stored_key=make_key(domain,url,key) exists = collection.get_by_key_name(stored_key) if exists is not None: # KEY is already added as a LINK return #add our new stored key long_url=make_url(domain,url) link = collection(key_name=stored_key) link.domain=domain link.url=long_url link.sibling=key link.put() #update our counter for the long_url ie friends/martin shards.increment(long_url +'_cnt') @staticmethod def get_siblings(domain,url, limit = 10, offset = None): """ get all the sibling keys for a given doomain/url - identifier """ li=[] """ make the long url example friends/martin """ long_url=make_url(domain,url) if offset is None: q = db.GqlQuery('SELECT * FROM collection WHERE url = :1', long_url) else: if type(offset) == type(str): offset = db.Key(offset) q = c.GqlQuery('WHERE url = :1 AND __key__ > :2', long_url, offset) #build our list for collection in q: li.append(collection.ref) """ return all the instances for the keys in our list """ return db.get(li) def make_url(domain='',url=None): """ makes a long url path using the passed url and domain used for our key_names """ if url is None: return None return domain+'/'+url def make_key(domain='',url=None,sibling=None): """ makes a url using the passed url and key used for our key_names keys are make like this domain+url+key which for an example could be friends/martin/[key] which defines the key as a sibling of our long url - domain+url in our real world app using the key of the instance would be safer as our human readable word may change i.e. a persons name """ if key is None: return None #build our long url path long_url=make_url(domain,url) #should we exit if long url is None? This might get some weird behavior return long_url+ '/'+ str(key) #the key is converted to a string Regards Martin Webb The information contained in this email is confidential and may contain proprietary information. It is meant solely for the intended recipient. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted in reliance on this, is prohibited and may be unlawful. No liability or responsibility is accepted if information or data is, for whatever reason corrupted or does not reach its intended recipient. No warranty is given that this email is free of viruses. The views expressed in this email are, unless otherwise stated, those of the author ________________________________ From: Mark <mar...@gmail.com> To: Google App Engine <google-appengine@googlegroups.com> Sent: Mon, 19 April, 2010 14:44:59 Subject: [google-appengine] Implementing a follower system similar to Twitter? Hi, I'm building a service similar to twitter, where users can follow one another. My user class looks like this: @PersistenceCapable class User { @PrimaryKey private String mUsername; } I'm thinking to store the follower/followee relationships in an intermediate class, and am not sure if this makes sense, or if there is a better way to do it. I'm thinking of the case where I'm viewing another user's info page, and want to follow them. Then I'd create a new object like this: @PersistenceCapable class Relationship { @PrimaryKey @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) private Key mKey; @Persistent private String mFollowerUsername; @Persistent private String mFolloweeUsername; } I can then query for the first N records above when needing to show a list of followers, and page as necessary. Is there a better pattern for doing this? I was reading this post: http://groups.google.com/group/google-appengine/browse_thread/thread/18bddc764f2647b9/d567b638a201fde8?lnk=gst&q=follower#d567b638a201fde8 where Brett suggested storing the follower/followee user ids directly in the User object (perhaps as an ArrayList<String>). In some sense this would be easier but if those lists have 10,000 entries each, won't it take a long time to load the User object? Loading user objects probably is a frequent action in my case. Also to support adding and deletion of followers, I'd need to search the ArrayList<String> for duplicates (guess I could use a set<String> instead. Not sure if this question belongs here, it seems pretty JDO specific, but I'm wondering what the best way to do this is so it scales well on app engine, Thank you -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.