[appengine-java] Re: GAE - Vote counting system
Sorry to bring this thread back up again, but I've noticed quite a lot of issues being posted to this and other groups about the task queue system failing - and scrolling back through the issues page (http:// code.google.com/status/appengine) it's always the taskqueue that has problems. Those of you who use it - are you finding it safe enough for production use with the kind of volumes and importance that's been mentioned here? Just wondering whether to re-factor my code to this pattern or continue writing too much to the datastore on each request. Thanks, Mat. On Oct 3, 7:34 pm, Jeff Schnitzer j...@infohazard.org wrote: On Mon, Oct 3, 2011 at 9:24 AM, Mat Jaggard matjagg...@gmail.com wrote: Jeff - I'm a bit confused. I thought that the whole idea of the datastore was that you could read or write as much as you want, as fast as you want as long as they are not related? So one datastore write per vote (and being written to different entity groups) should be fine? I thought that the system just split tablets if they were being accessed too much - so as long as the traffic didn't suddenly increase, there'd be no scalability issues apart from cost. apart from cost he says :-) The OP posited millions of users and millions of things to vote for. Each million votes will cost you (at minimum) $1.70 for one write + one read, but it'll probably be more depending on how many page views you have and what caching strategy you have. Still, maybe this is no big deal. The bigger problem though is that vote traffic is likely to be focused on a handful of items. Popular things might get thousands of votes per second, unpopular things won't be voted for at all. It's hard to come up with a sharding strategy that works well for this - you probably don't want 1k shards for everything, storage costs go up and expense/latency of calculating totals goes up. I have to deal with a similar problem myself right now (with the added constraint that I need an instantaneously precise count). I'm considering a system that automatically tracks latency and increases the shard count when it crosses a threshold. It's not a pretty problem to solve. Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
[appengine-java] Re: GAE - Vote counting system
Jeff - I'm a bit confused. I thought that the whole idea of the datastore was that you could read or write as much as you want, as fast as you want as long as they are not related? So one datastore write per vote (and being written to different entity groups) should be fine? I thought that the system just split tablets if they were being accessed too much - so as long as the traffic didn't suddenly increase, there'd be no scalability issues apart from cost. Have I misunderstood? Thanks, Mat. On Sep 30, 9:15 pm, Jeff Schnitzer j...@infohazard.org wrote: Assuming the goal is massive write throughput, you don't want to do 1 write per vote. You need to aggregate writes into a batch - you can do that with pull queues, but then you're limited to the maximum throughput of a pull queue. And the biggest batch size is 1,000 which might actually be votes for 1,000 different things, which means you're back to 1-vote-1-write. Peter, you can certainly build a system whereby all vote totals are tracked in RAM in a backend but now you're putting higher memory requirements on the backend and life gets more complicated when you deal with sharding. Depending on how exact you need the counts to be, you can always use increment() on memcache in addition to incrementing the backend. The only catch is that bootstrapping the initial memcache value will be a little tricky - you'll need to use CAS and query the backends for any existing deltas. Or just not care if it's off by a few. Jeff On Thu, Sep 29, 2011 at 5:56 AM, Mat Jaggard matjagg...@gmail.com wrote: Will that end up being cheaper than the following? Put a new entity in the datastore for each vote - Kind: Vote, ID: Auto generated, Vote for: Item S Have a task queue query all the votes and delete them then write the count of votes to a global object. Cost = 1 datastore read + 1 datastore write + some fairly minor task processing per vote. On Sep 29, 1:47 pm, Peter Dev dev133...@gmail.com wrote: Price: - with backends lets say 3 B2 machines = 350USD/Month - UrlFetch Data Sent/Received 0,15USD/GB Limit: - URL Fetch Daily Limit 46,000,000 calls this can be a problem...but I see it is possible to request an increase Write data parallel in DB: Task Queue with rate every 30second could be a solution (check timestamps in cache and write in DB) RESET counters = empty cache in Backends reset counter of object in DB Backends cache = HashMap with shared counter values or counter values without sharding (just increment value in java hashmap is fast enough) With backends we don’t need sharding I thinkwhat do you think? Thx. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Re: GAE - Vote counting system
On Mon, Oct 3, 2011 at 9:24 AM, Mat Jaggard matjagg...@gmail.com wrote: Jeff - I'm a bit confused. I thought that the whole idea of the datastore was that you could read or write as much as you want, as fast as you want as long as they are not related? So one datastore write per vote (and being written to different entity groups) should be fine? I thought that the system just split tablets if they were being accessed too much - so as long as the traffic didn't suddenly increase, there'd be no scalability issues apart from cost. apart from cost he says :-) The OP posited millions of users and millions of things to vote for. Each million votes will cost you (at minimum) $1.70 for one write + one read, but it'll probably be more depending on how many page views you have and what caching strategy you have. Still, maybe this is no big deal. The bigger problem though is that vote traffic is likely to be focused on a handful of items. Popular things might get thousands of votes per second, unpopular things won't be voted for at all. It's hard to come up with a sharding strategy that works well for this - you probably don't want 1k shards for everything, storage costs go up and expense/latency of calculating totals goes up. I have to deal with a similar problem myself right now (with the added constraint that I need an instantaneously precise count). I'm considering a system that automatically tracks latency and increases the shard count when it crosses a threshold. It's not a pretty problem to solve. Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
[appengine-java] Re: GAE - Vote counting system
After each vote we want to send back the actual state of voted object (actual votes)... so, we need to store the number of votes and not only the deltas. Actual state of the votes we could store in backends cache, and in batch write changes in db. What do you think about this solution? I appreciate your answers! Thx -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
[appengine-java] Re: GAE - Vote counting system
Will that end up being cheaper than the following? Put a new entity in the datastore for each vote - Kind: Vote, ID: Auto generated, Vote for: Item S Have a task queue query all the votes and delete them then write the count of votes to a global object. Cost = 1 datastore read + 1 datastore write + some fairly minor task processing per vote. On Sep 29, 1:47 pm, Peter Dev dev133...@gmail.com wrote: Price: - with backends lets say 3 B2 machines = 350USD/Month - UrlFetch Data Sent/Received 0,15USD/GB Limit: - URL Fetch Daily Limit 46,000,000 calls this can be a problem...but I see it is possible to request an increase Write data parallel in DB: Task Queue with rate every 30second could be a solution (check timestamps in cache and write in DB) RESET counters = empty cache in Backends reset counter of object in DB Backends cache = HashMap with shared counter values or counter values without sharding (just increment value in java hashmap is fast enough) With backends we don’t need sharding I thinkwhat do you think? Thx. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Re: GAE - Vote counting system
Assuming the goal is massive write throughput, you don't want to do 1 write per vote. You need to aggregate writes into a batch - you can do that with pull queues, but then you're limited to the maximum throughput of a pull queue. And the biggest batch size is 1,000 which might actually be votes for 1,000 different things, which means you're back to 1-vote-1-write. Peter, you can certainly build a system whereby all vote totals are tracked in RAM in a backend but now you're putting higher memory requirements on the backend and life gets more complicated when you deal with sharding. Depending on how exact you need the counts to be, you can always use increment() on memcache in addition to incrementing the backend. The only catch is that bootstrapping the initial memcache value will be a little tricky - you'll need to use CAS and query the backends for any existing deltas. Or just not care if it's off by a few. Jeff On Thu, Sep 29, 2011 at 5:56 AM, Mat Jaggard matjagg...@gmail.com wrote: Will that end up being cheaper than the following? Put a new entity in the datastore for each vote - Kind: Vote, ID: Auto generated, Vote for: Item S Have a task queue query all the votes and delete them then write the count of votes to a global object. Cost = 1 datastore read + 1 datastore write + some fairly minor task processing per vote. On Sep 29, 1:47 pm, Peter Dev dev133...@gmail.com wrote: Price: - with backends lets say 3 B2 machines = 350USD/Month - UrlFetch Data Sent/Received 0,15USD/GB Limit: - URL Fetch Daily Limit 46,000,000 calls this can be a problem...but I see it is possible to request an increase Write data parallel in DB: Task Queue with rate every 30second could be a solution (check timestamps in cache and write in DB) RESET counters = empty cache in Backends reset counter of object in DB Backends cache = HashMap with shared counter values or counter values without sharding (just increment value in java hashmap is fast enough) With backends we don’t need sharding I thinkwhat do you think? Thx. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
[appengine-java] Re: GAE - Vote counting system
Price: - with backends lets say 3 B2 machines = 350USD/Month - UrlFetch Data Sent/Received 0,15USD/GB Limit: - URL Fetch Daily Limit 46,000,000 calls this can be a problem...but I see it is possible to request an increase Write data parallel in DB: Task Queue with rate every 30second could be a solution (check timestamps in cache and write in DB) RESET counters = empty cache in Backends reset counter of object in DB Backends cache = HashMap with shared counter values or counter values without sharding (just increment value in java hashmap is fast enough) With backends we don’t need sharding I thinkwhat do you think? Thx. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Re: GAE - Vote counting system
It's hard to say here if we're talking about the same thing, but here's how I would do it: * Updates go through to the backend, which stores write deltas in ram (not the total). * Reads read-through memcache into the datastore. * The backend writes deltas to the datastore in batch, updating memcache, then purging the delta from memory. * The backends can write once every 5 mins or 30s or however long you're comfortable having read data be stale. More freshness == more datastore expense. It's a simple dial. This system is not limited by backend RAM since each backend stores only deltas - you can probably run the smallest size. It won't be limited by read volume, which will come almost entirely from memcache. It will be limited by max request throughput on the backend. Given the update is practically a no-op, how many QPS can a single backend serve? That's your limit (times the number of backends running). I think you'd be hard-pressed to find a better solution to this problem on GAE. It does require that reads be stale to with a controlled bound, though. Jeff On Thu, Sep 29, 2011 at 5:47 AM, Peter Dev dev133...@gmail.com wrote: Price: - with backends lets say 3 B2 machines = 350USD/Month - UrlFetch Data Sent/Received 0,15USD/GB Limit: - URL Fetch Daily Limit 46,000,000 calls this can be a problem...but I see it is possible to request an increase Write data parallel in DB: Task Queue with rate every 30second could be a solution (check timestamps in cache and write in DB) RESET counters = empty cache in Backends reset counter of object in DB Backends cache = HashMap with shared counter values or counter values without sharding (just increment value in java hashmap is fast enough) With backends we don’t need sharding I thinkwhat do you think? Thx. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
[appengine-java] Re: GAE - Vote counting system
Many writes to the same object will lead to db failures. You really should consider sharding: http://code.google.com/appengine/articles/sharding_counters.html On Sep 26, 12:41 am, Peter Dev dev133...@gmail.com wrote: We are developing an application, where users can vote for many objects. (for example, voting the best music video of the week) - This means, we have millions of possible objects to vote for, and millions of users To our best knowledge, after taking in consideration different options, the best (or the only) voting system is: Memcache+Bulk DB write in DB - If number of objects in Memcache will achieve a specified limit (for example 3000) then write in DB. The writing speed into DB is about 100/sec. This also means, if we would set the above mentioned 3000 objects, the writing would last for 30 sec... The problem; during save in DB, voting must be blocked. In other words, if from many millions of objects, there are 3000 achieved voted objects, we need to write it into DB and it can happen too many times, and blocking the whole voting mechanism. If we do not block the voting whilst writing in DB, the result could be wrong number of votes from cache (see Workflow: step 3). Workflow: 1. vote received 2. find object in memcache 3. if not found in memcache get from DB and put it in 4. increment the number of votes of the object in memcache 5. check object number in memcache 6. if necessary, save in DB and empty memcache ... -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
[appengine-java] Re: GAE - Vote counting system
Shared counter is cool and I use it... but if you have millions of objects I cannot imagine how to manage them. 1 000 000 obj x 100 shards = 10 000 000 counters 1. How to reset them to 0 in specified periods? 2. How to set the shared sum for each object to show top 100 objects? 3. Too much DB API Calls (each vote makes write in DB) Any ideas...? Thx -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
[appengine-java] Re: GAE - Vote counting system
Sorry, 100 000 000 counters On Sep 27, 4:53 pm, Peter Dev dev133...@gmail.com wrote: Shared counter is cool and I use it... but if you have millions of objects I cannot imagine how to manage them.1 000 000obj x 100 shards =10 000 000counters 1. How to reset them to 0 in specified periods? 2. How to set the shared sum for each object to show top 100 objects? 3. Too much DB API Calls (each vote makes write in DB) Any ideas...? Thx -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
Re: [appengine-java] Re: GAE - Vote counting system
Yeah, messy. I'd use a backend for this. Possibly a set of backends if you need to shard the data for write volume. I'd use Memcache only to cache the count reads. The basic entity is just an id and a count. An increment request goes to a backend, which simply tracks the change. A batch process goes through and writes any changed counts to both datastore and memcache (as an update; read increment write in a txn) every 5 minutes and clears the memory count. If write volume is too high for a single backend to handle, shard it by thingId % number of shards. You can change the shard count on the fly this way. All reads should be read from the memcache, read-through to the datastore as necessary. This should be able to handle any volume you want. If a backend crashes you'll lose its accumulated counts but I presume that's not a big deal. Jeff On Tue, Sep 27, 2011 at 10:38 AM, Peter Dev dev133...@gmail.com wrote: Sorry, 100 000 000 counters On Sep 27, 4:53 pm, Peter Dev dev133...@gmail.com wrote: Shared counter is cool and I use it... but if you have millions of objects I cannot imagine how to manage them.1 000 000obj x 100 shards =10 000 000counters 1. How to reset them to 0 in specified periods? 2. How to set the shared sum for each object to show top 100 objects? 3. Too much DB API Calls (each vote makes write in DB) Any ideas...? Thx -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.