I wouldn't put ratings into the same entity group as the product (or
its shards). If you're going to have 100K ratings per product, as you
said this will lead to high contention.

With so many ratings it would not make much difference if you don't
take all of them into account or if you don't provide real-time
average rating.

I would use shards to track the ratings (but without using
transactions) and from time to time re-calculate the average from the
shards and keep it with the product. This would allow indexing by the
average rating, without many sacrifices.

Now, the interesting part is how to trigger re-calculation. Until
Google fixes issue 6 there are only two obvious solutions: ping from
an external box or do your processing selectively in the usual request
handlers. An example of the latter would be updating the average on
each X rating and/or each Y seconds.

Hope any of this makes sense.

On Nov 3, 9:33 am, "Jay Freeman \(saurik\)" <[EMAIL PROTECTED]> wrote:
> Ok, so I've watched Brett's talk on "Building Scalable Web Apps with App 
> Engine", and I've read tons now on entity groups and transactions, but I'm 
> just completely at a loss to come up with an efficient way to model what 
> seems like a very common scenario. :( If anyone is willing to chime in with 
> ideas I'd be really appreciative.
>
> Here's the basic idea:
>
> a number of products
> a number of users
> users rate products 1-5
> products have an "average rating"
> users can sort products by rating
>
> Equivalent, and closer to the existing examples:
>
> a number of blog posts
> a number of users
> users post (many, many) comments on blog posts
> blog posts have a "comment count"
> users can list sort posts by comment count
>
> The first problem is that the counter examples I've seen haven't actually 
> needed to count anything that also needed to be in the datastore. As near as 
> I can tell, to accurately count a set of ratings for a particular product I 
> would need to have an entity group per statistics shard that also contained 
> all of the actual user/product/rating entities. If a user also was able to 
> update their rating/comment this means that the assertion that it is easy to 
> later change the number of shards becomes false: if I were to break it up 
> into five groups (let's say) and get a hundred thousand ratings for some 
> product (very likely in my case) then each group is going to have 20,000 
> entities in it already (with high contention for those updates, even though 
> they don't affect the count), and I have no real way to (safely) move them 
> between shards later.
>
> The second problem is that the existing discussions I've seen of this problem 
> ignore sorting by these statistics. Example: not only might you want the 
> total number of comments posted on a blog, you also might want to find blog 
> entries "most discussed" (those with the most comments). The mechanisms of 
> using memcache to store a consistent/efficient count thereby no longer work 
> very well: you need to store the information in an indexed entity. Also, you 
> can't afford for the information to ever be out of date: if you have a few 
> thousand products in your product catalog and you don't have an up-to-date 
> count for any of them, it isn't an option to rebuild the master statistics 
> for all of them before running your query. This means you need to be 
> constantly maintaining the updated global counts as you change things, and if 
> you want them to be safe/accurate that will need to be done in a transaction, 
> which pretty much pulls the entire product/post into a single giant entity 
> group.
>
> Put these together and it seems like this very website concept simply 
> /requires/ a really slow implementation :(. Specifically: every product/post 
> gets a single entity group, and the comments/ratings are stored in it, with a 
> single master count stored on the product/post entity (updated via 
> transactions as comments/ratings added); thereby causing massive contention 
> as everyone immediately swoops in to put their comment/rating in. If this is 
> simply "true" then I'll just go ahead and build this and feel "ok" in that I 
> did everything I could, but as long as I have doubts I'm finding it difficult 
> to force myself to lock this scheme into my data ;P.
>
> -J
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to