An elegant solution for this fan out problem would be to use the
Relation Index Entity pattern that Google suggested years ago (
http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html
). Thanks to Google's amazingly powerful datastore, you'd need to
write only a few lines of code to solve what is essentially a
difficult scalability problem. If I understand you correctly, your
approach c) is similar to this.

HOWEVER! The new pricing scheme has made the above solution cost
prohibitive. See its cost breakdown here:
http://groups.google.com/group/google-appengine/browse_thread/thread/1e8d694ce1e30cc7/b2e9e5476b33aea5?q=#b2e9e5476b33aea5
(note that the scary figure I initially arrived upon was actually an
underestimation!)

In the end I had to throw away a solution which took no more than 1
day of learning, thinking and coding, and build an alternative
solution that took a couple of weeks to complete (and involves many
moving parts, thus harder to maintain).

Here's the summary of my new solution:
1. As users make a new post, save the post ID in data structures in
memcache.

2. As a user views his feed, query said memcache data structures to
grab post IDs relevant to him

3. Take those IDs and query the datastore to retrieve matching
entities. I use Objectify's caching (memcache) to reduce datastore
hits in this step.

4. Keep data in the datastore in such a way that the data structures
in memcache can be restored, in case memcache loses them.

Now my app is different from Twitter/Facebook in that people don't
follow each other but rather a small number of topics, which keeps the
amount of data in memcache to a minimum. This lucky coincidence allows
all users to never hit the datastore as long as memcache keeps the
data structures. Unless users go back far enough and view really old
posts, in which case the data is pulled from the datastore. Luckily
this is rare.

Your case is different as the combination of user follows would result
in much bigger data. Ideally you'd want to add some intelligence so
that most use cases can be satisfied by memcache thereby keeping the
datastore costs low.

Nowadays I try to avoid using the datastore as much as possible by
using memcache or pull queues. It's a shame though, because the single
most compelling reason to use GAE over other platforms is the
awesomeness of the datastore, IMHO.

Here's my response to another fan out question
http://groups.google.com/group/google-appengine/msg/53d415bac2a04dca

On Feb 9, 6:33 am, Barry Hunter <barrybhun...@gmail.com> wrote:
> Pretty sure twitter does use c)
>
> I think it called "fan out".
>
> Overall its probably the most scalable - as you note a) is pretty hard-core.
>
> However saying that using DataStore 'stringlists' could make it work
> it pretty well. When someone makes a post, you add a string list
> listing all their followers. Each post will have to store the whole
> followers list.
>
> Then to display someone's "inbox" you just run a query looking for all
> posts with the user listed in the followers list.
>
> - writing the post - will be expensive - due to all the index writes.
> (2 per follower?) But ultimately that might be cheaper than writing a
> whole entity to everyones inbox.
>
> - and when someone gains or looses a follower you have to iterate
> though their posts adding/removing the new user. (but compare that to
> have to iterate though all their posts, and copy/remove all the posts
> into the second users inbox)
>
> ... there probably is no perfect answer :)
>
>
>
>
>
>
>
> On Wed, Feb 8, 2012 at 7:14 PM, Kaan Soral <kaanso...@gmail.com> wrote:
> > I have searched "twitter" in this group and went back 1 year in posts but
> > couldn't find anything but I remember a discussion like this before
>
> > Problem:
> > On both Twtter and FB, when you visit their Homepage, you see a summary of
> > what your friends or interests are doing.
> > You might have up to 1000 of these on average.
> > How to combine this information?
>
> > Some thinking:
> > a) A naive approach would be to query for everyone, every time =
> > impossible-ish
> > b) To make this naive approach work we might determine popular friends of a
> > user and achieve a query once, construct a callback for near time updates
> > approach (to achieve real time info)
> > c) Another approach is to not query at all but update everyone who are
> > subscribed to a user, when that user posts an action
> > This would work for FB I guess, but for Twtter, considering some people have
> > 1Ms of followers, this may be a problem
> > But the complexity is the sum of all followers over the network, and this
> > number may be low, considering not everyone has that many followers, so this
> > might be worth it
>
> > I am leaning over the (c) method, what do you guys think, how would you
> > implement these Social Networks?
> > (I combined these 2 networks in the problem definition, but you may provide
> > separate solutions)
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To view this discussion on the web visit
> >https://groups.google.com/d/msg/google-appengine/-/1WEnOApSj4sJ.
> > To post to this group, send email to google-appengine@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to