Yeah. The problem is that I actually bet that will happen ;P. The application in question (something akin to Apple's App Store or Google's Android Market) has about half a million users. People open the program and see "oh, new content!", which might be either a new application or a theme/ringtone or something. There isn't really /that/ much new content (maybe ten new packages a day, most of which being wallpaper and icon themes, stuff people are going to be opinionated about), but people seem to love looking around for it and over a hundred thousand people check every day. The experience is often tens of thousands of downloads of a new package within the first hour or so. I am not quite certain what I will see from the parallel rating load, but I do expect it to be somewhat high. It does seem, though, like I should just go ahead and see how well it all works :(. -J
-------------------------------------------------- From: "yejun" <[EMAIL PROTECTED]> Sent: Sunday, November 02, 2008 10:13 PM To: "Google App Engine" <google-appengine@googlegroups.com> Subject: [google-appengine] Re: more complicated counters/ratings (sorting?) > > Everytime a user rates a product, you update average rating for that > product as well. Unless there's more than dozen user need to rate a > single product simultaneously, you don't need shard the count at all. > > On Nov 2, 5:33 pm, "Jay Freeman \(saurik\)" <[EMAIL PROTECTED]> wrote: >> Ok, so I've watched Brett's talk on "Building Scalable Web Apps with App >> Engine", and I've read tons now on entity groups and transactions, but >> I'm just completely at a loss to come up with an efficient way to model >> what seems like a very common scenario. :( If anyone is willing to chime >> in with ideas I'd be really appreciative. >> >> Here's the basic idea: >> >> a number of products >> a number of users >> users rate products 1-5 >> products have an "average rating" >> users can sort products by rating >> >> Equivalent, and closer to the existing examples: >> >> a number of blog posts >> a number of users >> users post (many, many) comments on blog posts >> blog posts have a "comment count" >> users can list sort posts by comment count >> >> The first problem is that the counter examples I've seen haven't actually >> needed to count anything that also needed to be in the datastore. As near >> as I can tell, to accurately count a set of ratings for a particular >> product I would need to have an entity group per statistics shard that >> also contained all of the actual user/product/rating entities. If a user >> also was able to update their rating/comment this means that the >> assertion that it is easy to later change the number of shards becomes >> false: if I were to break it up into five groups (let's say) and get a >> hundred thousand ratings for some product (very likely in my case) then >> each group is going to have 20,000 entities in it already (with high >> contention for those updates, even though they don't affect the count), >> and I have no real way to (safely) move them between shards later. >> >> The second problem is that the existing discussions I've seen of this >> problem ignore sorting by these statistics. Example: not only might you >> want the total number of comments posted on a blog, you also might want >> to find blog entries "most discussed" (those with the most comments). The >> mechanisms of using memcache to store a consistent/efficient count >> thereby no longer work very well: you need to store the information in an >> indexed entity. Also, you can't afford for the information to ever be out >> of date: if you have a few thousand products in your product catalog and >> you don't have an up-to-date count for any of them, it isn't an option to >> rebuild the master statistics for all of them before running your query. >> This means you need to be constantly maintaining the updated global >> counts as you change things, and if you want them to be safe/accurate >> that will need to be done in a transaction, which pretty much pulls the >> entire product/post into a single giant entity group. >> >> Put these together and it seems like this very website concept simply >> /requires/ a really slow implementation :(. Specifically: every >> product/post gets a single entity group, and the comments/ratings are >> stored in it, with a single master count stored on the product/post >> entity (updated via transactions as comments/ratings added); thereby >> causing massive contention as everyone immediately swoops in to put their >> comment/rating in. If this is simply "true" then I'll just go ahead and >> build this and feel "ok" in that I did everything I could, but as long as >> I have doubts I'm finding it difficult to force myself to lock this >> scheme into my data ;P. >> >> -J --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---