Mark - im working on a similar system myself. I agree that using a string in 
the user class is potentially going to throw issues on a large scale - app as 
it would need sharding? also running into 10,000 of followers could get heavy. 
I have made an implementation using a simple relationship model. and it can be 
used for anything. Friendships, followers, related images anything you like. I 
am not 100% sure it is the best model or even built correctly but i was 
considering posting it for comments. Its still in working so the code is not 
100% tested ect - but it may give you some ideas.
Any comments on this are much appreciated - as i have said i have looked at 
list properties but i can only imagine they would need sharding etc etc.
I have not added any mcache support but this will be added to the finish class 
- i will post my code if anyone is interested.




!/usr/bin/env python
#
# Copyright 2007 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from google.appengine.ext import db
import google.appengine.ext.db
from google.appengine.api import memcache
import shards


class collection(db.Model):
    """
    Stores relationships - This could be achieved using a 
ListProperty(db.ReferenceProperty) but that may 
    run into issues as it is possible that many "sessions" could be appending 
keys at the same time. Also a list
    property may have a performance issue if their are millions of siblings 
following for one entity. 
    
    further-more we can mchache our keys for even faster performance when we 
check if keys are siblings of parent urls.
    
    I feel that this approach simple
    it may be - may be a better approach for long term performance?
    
    The collection object stores a domain, url and a key 
    
    domain, url, key
    
    Where the domain and url describe the object and the key is the related 
sibling.
    
    domian='friend'
    url='martin'
    key=some instance key
    
    long_url = 'friend/martin'
    stored_key ='friend/martin/[some key]'
    
    We can store friendships by storing one side of the friendship as the url 
and the siblings (friends) as keys.
    
    the query to test if a sibling is a friend of a parent - is VERY simple: 
    stored_key=make_key(domian, url,key)
    exists = collection.get_by_key_name(stored_key)
        
    This works as when we store a key the we store the key name as the 
domian+url+key; this is done using the make_key function
    
    We can memcache this - again using the stored_key as the cache key for 
super quick requests - when we are using the collection for say friendships
    user pages will need to know if the current user is a friend of the page 
owner - mcache will respond super fast in this scenario.
    
    Counters.
    
    We use our shard class for keeping count of how many instances are related 
to the url (ie how many friends to an entity)
    
    counter_key=long_url+'/cnt'
    shards.increment(counter_key)
    
    (note the format of our key )
    
    Getting a list of all the keys that are a sibling: (so all the friends of 
an entity)
    
    long_url=make_url('friends','martin')  # 'friends/martin'
    
    all_siblings = db.GqlQuery('SELECT * FROM collection WHERE url = :1', 
long_url).get(limit) #paging removed
    
    We use a simple query that filters the url and returns the keys. As the 
keys are strings we don't need to worry about
    the data-store loading 'related instances'. Once we get our keys - we can 
create a short list of keys [] that can then make one
    call to the datastore and load the instances by key. 
    This in theory should be super-quick as only the instances say 10 at a time 
are loaded for for queries where potentialy thousands of
    siblings may be present
    
    
    #Make a list of keys so that we can load our instances in a flash - in real 
world this might be 10 at a time (see limit in above query)
    li=[]
    for instance in keys:
            li.append(instance.key)
            
    #get a list of instances for the keys
    return db.get(li)   
    
    
    The next question we could ask the datastore is:
    
    who am i a sibling of (who am i friends to)
    
    This can be returned using another simple query;
    
    all_instances = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND 
domain = ;2', sibling_key, 'friends').get(limit)
    note we use 'friends' as our domain as what to only return keys where the 
relationship is of a friendship
    
    
    Another example of use
    
    Lets say i want to create a group of people this can be model'd by
    creating a model for Groups:
    
    Class Groups(db.Model)
    Name: db.StringProperty()
    
    
    Now we can do:
    
    martin=user("martin")
    leo=user("leo")
    
    group=Group(key_name="Club1", Name="Club1")
    
    links.collection.add_sibling('groups',Club.Key(),martin)
    links.collection.add_sibling('groups',Club.Key(),leo)
    
    Note in this example we identify that the relationship is a group by using 
'group' as our domain.
    
    now we can use the queries above to see "who" is in the group and if a 
person is a member of a group and what groups say martin is a member of;
    
    all_groups = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND 
domain = ;2', martin.key() , 'groups').get(limit)
    
    the above lists all the 'groups' martin is a member of.
    

    Friendships:

     martin=user("martin") #is the initiator
    leo=user("leo") #is the acceptor
    
    #adds the relationship    
    links.collection.add_sibling('friends',leo.key(),martin.key())
    #if the relaitionship is too sided we simply reverse it
    links.collection.add_sibling('friends',martin.key(),leo.key()) #this may 
not be the best solution - we could simply create a query to find reversed 
relationships?

   
    the above queries can be used to get the relationships


    This class is useful in social apps where many relationships can be made. 

    TO DO:

    1. test
    2. add mcache
    3. add other in built queries as detailed in header outline (list siblings, 
who am i a sibling of, am i a sibling)
    4. add remove relationship


    """
    
    
    """
    
    
    
    THE MODEL OBJECT
    domain, url, key 
    
    the domain of the instance example 'freinds'
    """
    domain=db.StringProperty()
    """
    a url to describe our parent like friends/martin
    we make the url by using;
    
    url=make_key(domain,url,sibling)
    
    """
    url   = db.StringProperty() 
    """the key of our sibling instance"""
    sibling = db.StringProperty() 
    
    @staticmethod
    def add_sibling(domain='',url=None, key=None):
        """
        Add a sibling key to a url's collection. 
        the key should be like numberfile.key()
        the url should be  a long url path like martin/friends but not include 
the key
        """
        
        """
        make the unique stored key which is the complete path long_url/key
        """
        stored_key=make_key(domain,url,key)
        exists = collection.get_by_key_name(stored_key)
        if exists is not None:
            # KEY is already added as a LINK
            return
        
        #add our new stored key
        long_url=make_url(domain,url)
        
        link = collection(key_name=stored_key)
        link.domain=domain
        link.url=long_url
        link.sibling=key
        link.put()
        #update our counter for the long_url ie friends/martin
        shards.increment(long_url +'_cnt')
    
    
    @staticmethod
    def get_siblings(domain,url, limit = 10, offset = None):
        """
        get all the sibling keys for a given doomain/url - identifier
        """

        li=[]
        """
        make the long url example friends/martin
        """
        long_url=make_url(domain,url)
        
        if offset is None:
            q = db.GqlQuery('SELECT * FROM collection WHERE url = :1', long_url)
        else:
            if type(offset) == type(str):
                offset = db.Key(offset)
                q = c.GqlQuery('WHERE url = :1 AND __key__ > :2', long_url, 
offset)
        
        #build our list
        for collection in q:
            li.append(collection.ref)
        
        """
        return all the instances for the keys in our list
        """
        
        return db.get(li)
        
        

def make_url(domain='',url=None):
    """
    makes a long url path using the passed url and domain used for our key_names
    """
    
    if url is None:
        return None
    return domain+'/'+url 
        

    
    
def make_key(domain='',url=None,sibling=None):
    """
    makes a url using the passed url and key used for our key_names
    
    keys are make like this
    
    domain+url+key
    
    which for an example could be
    
    friends/martin/[key]

    which defines the key as a sibling of our long url - domain+url
    
    in our real world app using the key of the instance would be safer as our 
human readable word may change 
    i.e. a persons name
    """
    
    if key is None:
        return None
    
    #build our long url path
    long_url=make_url(domain,url)
    #should we exit if long url is None? This might get some weird behavior
    
    return long_url+ '/'+ str(key) #the key is converted to a string















 
Regards
 
 
Martin Webb

 
The information contained in this email is confidential and may contain 
proprietary information. It is meant solely for the intended recipient. Access 
to this email by anyone else is unauthorised. If you are not the intended 
recipient, any disclosure, copying, distribution or any action taken or omitted 
in reliance on this, is prohibited and may be unlawful. No liability or 
responsibility is accepted if information or data is, for whatever reason 
corrupted or does not reach its intended recipient. No warranty is given that 
this email is free of viruses. The views expressed in this email are, unless 
otherwise stated, those of the author 
 
 




________________________________
From: Mark <mar...@gmail.com>
To: Google App Engine <google-appengine@googlegroups.com>
Sent: Mon, 19 April, 2010 14:44:59
Subject: [google-appengine] Implementing a follower system similar to Twitter?

Hi,

I'm building a service similar to twitter, where users can follow one
another. My user class looks like this:

  @PersistenceCapable
  class User {

      @PrimaryKey
      private String mUsername;
  }

I'm thinking to store the follower/followee relationships in an
intermediate class, and am not sure if this makes sense, or if there
is a better way to do it. I'm thinking of the case where I'm viewing
another user's info page, and want to follow them. Then I'd create a
new object like this:

  @PersistenceCapable
  class Relationship {

     @PrimaryKey
     @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
     private Key mKey;

     @Persistent
     private String mFollowerUsername;

     @Persistent
     private String mFolloweeUsername;
  }

I can then query for the first N records above when needing to show a
list of followers, and page as necessary. Is there a better pattern
for doing this? I was reading this post:

  
http://groups.google.com/group/google-appengine/browse_thread/thread/18bddc764f2647b9/d567b638a201fde8?lnk=gst&q=follower#d567b638a201fde8

where Brett suggested storing the follower/followee user ids directly
in the User object (perhaps as an ArrayList<String>). In some sense
this would be easier but if those lists have 10,000 entries each,
won't it take a long time to load the User object? Loading user
objects probably is a frequent action in my case. Also to support
adding and deletion of followers, I'd need to search the
ArrayList<String> for duplicates (guess I could use a set<String>
instead.

Not sure if this question belongs here, it seems pretty JDO specific,
but I'm wondering what the best way to do this is so it scales well on
app engine,

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.


      

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to