Ok, so I've watched Brett's talk on "Building Scalable Web Apps with App 
Engine", and I've read tons now on entity groups and transactions, but I'm just 
completely at a loss to come up with an efficient way to model what seems like 
a very common scenario. :( If anyone is willing to chime in with ideas I'd be 
really appreciative.

Here's the basic idea:

a number of products
a number of users
users rate products 1-5
products have an "average rating"
users can sort products by rating

Equivalent, and closer to the existing examples:

a number of blog posts
a number of users
users post (many, many) comments on blog posts
blog posts have a "comment count"
users can list sort posts by comment count

The first problem is that the counter examples I've seen haven't actually 
needed to count anything that also needed to be in the datastore. As near as I 
can tell, to accurately count a set of ratings for a particular product I would 
need to have an entity group per statistics shard that also contained all of 
the actual user/product/rating entities. If a user also was able to update 
their rating/comment this means that the assertion that it is easy to later 
change the number of shards becomes false: if I were to break it up into five 
groups (let's say) and get a hundred thousand ratings for some product (very 
likely in my case) then each group is going to have 20,000 entities in it 
already (with high contention for those updates, even though they don't affect 
the count), and I have no real way to (safely) move them between shards later.

The second problem is that the existing discussions I've seen of this problem 
ignore sorting by these statistics. Example: not only might you want the total 
number of comments posted on a blog, you also might want to find blog entries 
"most discussed" (those with the most comments). The mechanisms of using 
memcache to store a consistent/efficient count thereby no longer work very 
well: you need to store the information in an indexed entity. Also, you can't 
afford for the information to ever be out of date: if you have a few thousand 
products in your product catalog and you don't have an up-to-date count for any 
of them, it isn't an option to rebuild the master statistics for all of them 
before running your query. This means you need to be constantly maintaining the 
updated global counts as you change things, and if you want them to be 
safe/accurate that will need to be done in a transaction, which pretty much 
pulls the entire product/post into a single giant entity group.

Put these together and it seems like this very website concept simply 
/requires/ a really slow implementation :(. Specifically: every product/post 
gets a single entity group, and the comments/ratings are stored in it, with a 
single master count stored on the product/post entity (updated via transactions 
as comments/ratings added); thereby causing massive contention as everyone 
immediately swoops in to put their comment/rating in. If this is simply "true" 
then I'll just go ahead and build this and feel "ok" in that I did everything I 
could, but as long as I have doubts I'm finding it difficult to force myself to 
lock this scheme into my data ;P.

-J
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to