Yeah. The problem is that I actually bet that will happen ;P. The 
application in question (something akin to Apple's App Store or Google's 
Android Market) has about half a million users. People open the program and 
see "oh, new content!", which might be either a new application or a 
theme/ringtone or something. There isn't really /that/ much new content 
(maybe ten new packages a day, most of which being wallpaper and icon 
themes, stuff people are going to be opinionated about), but people seem to 
love looking around for it and over a hundred thousand people check every 
day. The experience is often tens of thousands of downloads of a new package 
within the first hour or so. I am not quite certain what I will see from the 
parallel rating load, but I do expect it to be somewhat high. It does seem, 
though, like I should just go ahead and see how well it all works :(. -J

--------------------------------------------------
From: "yejun" <[EMAIL PROTECTED]>
Sent: Sunday, November 02, 2008 10:13 PM
To: "Google App Engine" <google-appengine@googlegroups.com>
Subject: [google-appengine] Re: more complicated counters/ratings (sorting?)

>
> Everytime a user rates a product, you update average rating for that
> product as well. Unless there's more than dozen user need to rate a
> single product simultaneously, you don't need shard the count at all.
>
> On Nov 2, 5:33 pm, "Jay Freeman \(saurik\)" <[EMAIL PROTECTED]> wrote:
>> Ok, so I've watched Brett's talk on "Building Scalable Web Apps with App 
>> Engine", and I've read tons now on entity groups and transactions, but 
>> I'm just completely at a loss to come up with an efficient way to model 
>> what seems like a very common scenario. :( If anyone is willing to chime 
>> in with ideas I'd be really appreciative.
>>
>> Here's the basic idea:
>>
>> a number of products
>> a number of users
>> users rate products 1-5
>> products have an "average rating"
>> users can sort products by rating
>>
>> Equivalent, and closer to the existing examples:
>>
>> a number of blog posts
>> a number of users
>> users post (many, many) comments on blog posts
>> blog posts have a "comment count"
>> users can list sort posts by comment count
>>
>> The first problem is that the counter examples I've seen haven't actually 
>> needed to count anything that also needed to be in the datastore. As near 
>> as I can tell, to accurately count a set of ratings for a particular 
>> product I would need to have an entity group per statistics shard that 
>> also contained all of the actual user/product/rating entities. If a user 
>> also was able to update their rating/comment this means that the 
>> assertion that it is easy to later change the number of shards becomes 
>> false: if I were to break it up into five groups (let's say) and get a 
>> hundred thousand ratings for some product (very likely in my case) then 
>> each group is going to have 20,000 entities in it already (with high 
>> contention for those updates, even though they don't affect the count), 
>> and I have no real way to (safely) move them between shards later.
>>
>> The second problem is that the existing discussions I've seen of this 
>> problem ignore sorting by these statistics. Example: not only might you 
>> want the total number of comments posted on a blog, you also might want 
>> to find blog entries "most discussed" (those with the most comments). The 
>> mechanisms of using memcache to store a consistent/efficient count 
>> thereby no longer work very well: you need to store the information in an 
>> indexed entity. Also, you can't afford for the information to ever be out 
>> of date: if you have a few thousand products in your product catalog and 
>> you don't have an up-to-date count for any of them, it isn't an option to 
>> rebuild the master statistics for all of them before running your query. 
>> This means you need to be constantly maintaining the updated global 
>> counts as you change things, and if you want them to be safe/accurate 
>> that will need to be done in a transaction, which pretty much pulls the 
>> entire product/post into a single giant entity group.
>>
>> Put these together and it seems like this very website concept simply 
>> /requires/ a really slow implementation :(. Specifically: every 
>> product/post gets a single entity group, and the comments/ratings are 
>> stored in it, with a single master count stored on the product/post 
>> entity (updated via transactions as comments/ratings added); thereby 
>> causing massive contention as everyone immediately swoops in to put their 
>> comment/rating in. If this is simply "true" then I'll just go ahead and 
>> build this and feel "ok" in that I did everything I could, but as long as 
>> I have doubts I'm finding it difficult to force myself to lock this 
>> scheme into my data ;P.
>>
>> -J 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to