Everytime a user rates a product, you update average rating for that product as well. Unless there's more than dozen user need to rate a single product simultaneously, you don't need shard the count at all.
On Nov 2, 5:33 pm, "Jay Freeman \(saurik\)" <[EMAIL PROTECTED]> wrote: > Ok, so I've watched Brett's talk on "Building Scalable Web Apps with App > Engine", and I've read tons now on entity groups and transactions, but I'm > just completely at a loss to come up with an efficient way to model what > seems like a very common scenario. :( If anyone is willing to chime in with > ideas I'd be really appreciative. > > Here's the basic idea: > > a number of products > a number of users > users rate products 1-5 > products have an "average rating" > users can sort products by rating > > Equivalent, and closer to the existing examples: > > a number of blog posts > a number of users > users post (many, many) comments on blog posts > blog posts have a "comment count" > users can list sort posts by comment count > > The first problem is that the counter examples I've seen haven't actually > needed to count anything that also needed to be in the datastore. As near as > I can tell, to accurately count a set of ratings for a particular product I > would need to have an entity group per statistics shard that also contained > all of the actual user/product/rating entities. If a user also was able to > update their rating/comment this means that the assertion that it is easy to > later change the number of shards becomes false: if I were to break it up > into five groups (let's say) and get a hundred thousand ratings for some > product (very likely in my case) then each group is going to have 20,000 > entities in it already (with high contention for those updates, even though > they don't affect the count), and I have no real way to (safely) move them > between shards later. > > The second problem is that the existing discussions I've seen of this problem > ignore sorting by these statistics. Example: not only might you want the > total number of comments posted on a blog, you also might want to find blog > entries "most discussed" (those with the most comments). The mechanisms of > using memcache to store a consistent/efficient count thereby no longer work > very well: you need to store the information in an indexed entity. Also, you > can't afford for the information to ever be out of date: if you have a few > thousand products in your product catalog and you don't have an up-to-date > count for any of them, it isn't an option to rebuild the master statistics > for all of them before running your query. This means you need to be > constantly maintaining the updated global counts as you change things, and if > you want them to be safe/accurate that will need to be done in a transaction, > which pretty much pulls the entire product/post into a single giant entity > group. > > Put these together and it seems like this very website concept simply > /requires/ a really slow implementation :(. Specifically: every product/post > gets a single entity group, and the comments/ratings are stored in it, with a > single master count stored on the product/post entity (updated via > transactions as comments/ratings added); thereby causing massive contention > as everyone immediately swoops in to put their comment/rating in. If this is > simply "true" then I'll just go ahead and build this and feel "ok" in that I > did everything I could, but as long as I have doubts I'm finding it difficult > to force myself to lock this scheme into my data ;P. > > -J --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---