Everytime a user rates a product, you update average rating for that
product as well. Unless there's more than dozen user need to rate a
single product simultaneously, you don't need shard the count at all.

On Nov 2, 5:33 pm, "Jay Freeman \(saurik\)" <[EMAIL PROTECTED]> wrote:
> Ok, so I've watched Brett's talk on "Building Scalable Web Apps with App 
> Engine", and I've read tons now on entity groups and transactions, but I'm 
> just completely at a loss to come up with an efficient way to model what 
> seems like a very common scenario. :( If anyone is willing to chime in with 
> ideas I'd be really appreciative.
>
> Here's the basic idea:
>
> a number of products
> a number of users
> users rate products 1-5
> products have an "average rating"
> users can sort products by rating
>
> Equivalent, and closer to the existing examples:
>
> a number of blog posts
> a number of users
> users post (many, many) comments on blog posts
> blog posts have a "comment count"
> users can list sort posts by comment count
>
> The first problem is that the counter examples I've seen haven't actually 
> needed to count anything that also needed to be in the datastore. As near as 
> I can tell, to accurately count a set of ratings for a particular product I 
> would need to have an entity group per statistics shard that also contained 
> all of the actual user/product/rating entities. If a user also was able to 
> update their rating/comment this means that the assertion that it is easy to 
> later change the number of shards becomes false: if I were to break it up 
> into five groups (let's say) and get a hundred thousand ratings for some 
> product (very likely in my case) then each group is going to have 20,000 
> entities in it already (with high contention for those updates, even though 
> they don't affect the count), and I have no real way to (safely) move them 
> between shards later.
>
> The second problem is that the existing discussions I've seen of this problem 
> ignore sorting by these statistics. Example: not only might you want the 
> total number of comments posted on a blog, you also might want to find blog 
> entries "most discussed" (those with the most comments). The mechanisms of 
> using memcache to store a consistent/efficient count thereby no longer work 
> very well: you need to store the information in an indexed entity. Also, you 
> can't afford for the information to ever be out of date: if you have a few 
> thousand products in your product catalog and you don't have an up-to-date 
> count for any of them, it isn't an option to rebuild the master statistics 
> for all of them before running your query. This means you need to be 
> constantly maintaining the updated global counts as you change things, and if 
> you want them to be safe/accurate that will need to be done in a transaction, 
> which pretty much pulls the entire product/post into a single giant entity 
> group.
>
> Put these together and it seems like this very website concept simply 
> /requires/ a really slow implementation :(. Specifically: every product/post 
> gets a single entity group, and the comments/ratings are stored in it, with a 
> single master count stored on the product/post entity (updated via 
> transactions as comments/ratings added); thereby causing massive contention 
> as everyone immediately swoops in to put their comment/rating in. If this is 
> simply "true" then I'll just go ahead and build this and feel "ok" in that I 
> did everything I could, but as long as I have doubts I'm finding it difficult 
> to force myself to lock this scheme into my data ;P.
>
> -J
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to