An elegant solution for this fan out problem would be to use the Relation Index Entity pattern that Google suggested years ago ( http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html ). Thanks to Google's amazingly powerful datastore, you'd need to write only a few lines of code to solve what is essentially a difficult scalability problem. If I understand you correctly, your approach c) is similar to this.
HOWEVER! The new pricing scheme has made the above solution cost prohibitive. See its cost breakdown here: http://groups.google.com/group/google-appengine/browse_thread/thread/1e8d694ce1e30cc7/b2e9e5476b33aea5?q=#b2e9e5476b33aea5 (note that the scary figure I initially arrived upon was actually an underestimation!) In the end I had to throw away a solution which took no more than 1 day of learning, thinking and coding, and build an alternative solution that took a couple of weeks to complete (and involves many moving parts, thus harder to maintain). Here's the summary of my new solution: 1. As users make a new post, save the post ID in data structures in memcache. 2. As a user views his feed, query said memcache data structures to grab post IDs relevant to him 3. Take those IDs and query the datastore to retrieve matching entities. I use Objectify's caching (memcache) to reduce datastore hits in this step. 4. Keep data in the datastore in such a way that the data structures in memcache can be restored, in case memcache loses them. Now my app is different from Twitter/Facebook in that people don't follow each other but rather a small number of topics, which keeps the amount of data in memcache to a minimum. This lucky coincidence allows all users to never hit the datastore as long as memcache keeps the data structures. Unless users go back far enough and view really old posts, in which case the data is pulled from the datastore. Luckily this is rare. Your case is different as the combination of user follows would result in much bigger data. Ideally you'd want to add some intelligence so that most use cases can be satisfied by memcache thereby keeping the datastore costs low. Nowadays I try to avoid using the datastore as much as possible by using memcache or pull queues. It's a shame though, because the single most compelling reason to use GAE over other platforms is the awesomeness of the datastore, IMHO. Here's my response to another fan out question http://groups.google.com/group/google-appengine/msg/53d415bac2a04dca On Feb 9, 6:33 am, Barry Hunter <barrybhun...@gmail.com> wrote: > Pretty sure twitter does use c) > > I think it called "fan out". > > Overall its probably the most scalable - as you note a) is pretty hard-core. > > However saying that using DataStore 'stringlists' could make it work > it pretty well. When someone makes a post, you add a string list > listing all their followers. Each post will have to store the whole > followers list. > > Then to display someone's "inbox" you just run a query looking for all > posts with the user listed in the followers list. > > - writing the post - will be expensive - due to all the index writes. > (2 per follower?) But ultimately that might be cheaper than writing a > whole entity to everyones inbox. > > - and when someone gains or looses a follower you have to iterate > though their posts adding/removing the new user. (but compare that to > have to iterate though all their posts, and copy/remove all the posts > into the second users inbox) > > ... there probably is no perfect answer :) > > > > > > > > On Wed, Feb 8, 2012 at 7:14 PM, Kaan Soral <kaanso...@gmail.com> wrote: > > I have searched "twitter" in this group and went back 1 year in posts but > > couldn't find anything but I remember a discussion like this before > > > Problem: > > On both Twtter and FB, when you visit their Homepage, you see a summary of > > what your friends or interests are doing. > > You might have up to 1000 of these on average. > > How to combine this information? > > > Some thinking: > > a) A naive approach would be to query for everyone, every time = > > impossible-ish > > b) To make this naive approach work we might determine popular friends of a > > user and achieve a query once, construct a callback for near time updates > > approach (to achieve real time info) > > c) Another approach is to not query at all but update everyone who are > > subscribed to a user, when that user posts an action > > This would work for FB I guess, but for Twtter, considering some people have > > 1Ms of followers, this may be a problem > > But the complexity is the sum of all followers over the network, and this > > number may be low, considering not everyone has that many followers, so this > > might be worth it > > > I am leaning over the (c) method, what do you guys think, how would you > > implement these Social Networks? > > (I combined these 2 networks in the problem definition, but you may provide > > separate solutions) > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To view this discussion on the web visit > >https://groups.google.com/d/msg/google-appengine/-/1WEnOApSj4sJ. > > To post to this group, send email to google-appengine@googlegroups.com. > > To unsubscribe from this group, send email to > > google-appengine+unsubscr...@googlegroups.com. > > For more options, visit this group at > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.