[appengine-java] Re: GAE - Vote counting system

2011-10-11 Thread Mat Jaggard
Sorry to bring this thread back up again, but I've noticed quite a lot
of issues being posted to this and other groups about the task queue
system failing - and scrolling back through the issues page (http://
code.google.com/status/appengine) it's always the taskqueue that has
problems. Those of you who use it - are you finding it safe enough for
production use with the kind of volumes and importance that's been
mentioned here? Just wondering whether to re-factor my code to this
pattern or continue writing too much to the datastore on each request.

Thanks,
Mat.

On Oct 3, 7:34 pm, Jeff Schnitzer j...@infohazard.org wrote:
 On Mon, Oct 3, 2011 at 9:24 AM, Mat Jaggard matjagg...@gmail.com wrote:
  Jeff - I'm a bit confused. I thought that the whole idea of the
  datastore was that you could read or write as much as you want, as
  fast as you want as long as they are not related? So one datastore
  write per vote (and being written to different entity groups) should
  be fine? I thought that the system just split tablets if they were
  being accessed too much - so as long as the traffic didn't suddenly
  increase, there'd be no scalability issues apart from cost.

 apart from cost he says :-)

 The OP posited millions of users and millions of things to vote for.
 Each million votes will cost you (at minimum) $1.70 for one write +
 one read, but it'll probably be more depending on how many page views
 you have and what caching strategy you have.  Still, maybe this is no
 big deal.

 The bigger problem though is that vote traffic is likely to be focused
 on a handful of items.  Popular things might get thousands of votes
 per second, unpopular things won't be voted for at all.  It's hard to
 come up with a sharding strategy that works well for this - you
 probably don't want 1k shards for everything, storage costs go up and
 expense/latency of calculating totals goes up.

 I have to deal with a similar problem myself right now (with the added
 constraint that I need an instantaneously precise count).  I'm
 considering a system that automatically tracks latency and increases
 the shard count when it crosses a threshold.  It's not a pretty
 problem to solve.

 Jeff

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



[appengine-java] Re: GAE - Vote counting system

2011-10-03 Thread Mat Jaggard
Jeff - I'm a bit confused. I thought that the whole idea of the
datastore was that you could read or write as much as you want, as
fast as you want as long as they are not related? So one datastore
write per vote (and being written to different entity groups) should
be fine? I thought that the system just split tablets if they were
being accessed too much - so as long as the traffic didn't suddenly
increase, there'd be no scalability issues apart from cost.

Have I misunderstood?

Thanks,
Mat.

On Sep 30, 9:15 pm, Jeff Schnitzer j...@infohazard.org wrote:
 Assuming the goal is massive write throughput, you don't want to do 1
 write per vote.  You need to aggregate writes into a batch - you can
 do that with pull queues, but then you're limited to the maximum
 throughput of a  pull queue.  And the biggest batch size is 1,000
 which might actually be votes for 1,000 different things, which means
 you're back to 1-vote-1-write.

 Peter, you can certainly build a system whereby all vote totals are
 tracked in RAM in a backend but now you're putting higher memory
 requirements on the backend and life gets more complicated when you
 deal with sharding.

 Depending on how exact you need the counts to be, you can always use
 increment() on memcache in addition to incrementing the backend.  The
 only catch is that bootstrapping the initial memcache value will be a
 little tricky - you'll need to use CAS and query the backends for any
 existing deltas.  Or just not care if it's off by a few.

 Jeff







 On Thu, Sep 29, 2011 at 5:56 AM, Mat Jaggard matjagg...@gmail.com wrote:
  Will that end up being cheaper than the following?

  Put a new entity in the datastore for each vote - Kind: Vote, ID:
  Auto generated, Vote for: Item S
  Have a task queue query all the votes and delete them then write the
  count of votes to a global object.

  Cost = 1 datastore read + 1 datastore write + some fairly minor task
  processing per vote.

  On Sep 29, 1:47 pm, Peter Dev dev133...@gmail.com wrote:
  Price:
  - with backends lets say 3 B2 machines = 350USD/Month
  - UrlFetch Data Sent/Received                         0,15USD/GB

  Limit:
  - URL Fetch Daily Limit 46,000,000 calls
    this can be a problem...but I see it is possible to request an
  increase

  Write data parallel in DB: Task Queue with rate every 30second could
  be a solution
  (check timestamps in cache and write in DB)

  RESET counters = empty cache in Backends  reset counter of object in
  DB

  Backends cache = HashMap with shared counter values
  or
  counter values without sharding
  (just increment value in java hashmap is fast enough)

  With backends we don’t need sharding I thinkwhat do you think? Thx.

  --
  You received this message because you are subscribed to the Google Groups 
  Google App Engine for Java group.
  To post to this group, send email to google-appengine-java@googlegroups.com.
  To unsubscribe from this group, send email to 
  google-appengine-java+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/google-appengine-java?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: GAE - Vote counting system

2011-10-03 Thread Jeff Schnitzer
On Mon, Oct 3, 2011 at 9:24 AM, Mat Jaggard matjagg...@gmail.com wrote:
 Jeff - I'm a bit confused. I thought that the whole idea of the
 datastore was that you could read or write as much as you want, as
 fast as you want as long as they are not related? So one datastore
 write per vote (and being written to different entity groups) should
 be fine? I thought that the system just split tablets if they were
 being accessed too much - so as long as the traffic didn't suddenly
 increase, there'd be no scalability issues apart from cost.

apart from cost he says :-)

The OP posited millions of users and millions of things to vote for.
Each million votes will cost you (at minimum) $1.70 for one write +
one read, but it'll probably be more depending on how many page views
you have and what caching strategy you have.  Still, maybe this is no
big deal.

The bigger problem though is that vote traffic is likely to be focused
on a handful of items.  Popular things might get thousands of votes
per second, unpopular things won't be voted for at all.  It's hard to
come up with a sharding strategy that works well for this - you
probably don't want 1k shards for everything, storage costs go up and
expense/latency of calculating totals goes up.

I have to deal with a similar problem myself right now (with the added
constraint that I need an instantaneously precise count).  I'm
considering a system that automatically tracks latency and increases
the shard count when it crosses a threshold.  It's not a pretty
problem to solve.

Jeff

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



[appengine-java] Re: GAE - Vote counting system

2011-09-30 Thread Peter Dev
After each vote we want to send back the actual state of voted object
(actual votes)... so, we need to store the number of votes and not
only the deltas.
Actual state of the votes we could store in backends cache, and in
batch write changes in db.
What do you think about this solution?
I appreciate your answers! Thx

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



[appengine-java] Re: GAE - Vote counting system

2011-09-30 Thread Mat Jaggard
Will that end up being cheaper than the following?

Put a new entity in the datastore for each vote - Kind: Vote, ID:
Auto generated, Vote for: Item S
Have a task queue query all the votes and delete them then write the
count of votes to a global object.

Cost = 1 datastore read + 1 datastore write + some fairly minor task
processing per vote.

On Sep 29, 1:47 pm, Peter Dev dev133...@gmail.com wrote:
 Price:
 - with backends lets say 3 B2 machines = 350USD/Month
 - UrlFetch Data Sent/Received                         0,15USD/GB

 Limit:
 - URL Fetch Daily Limit 46,000,000 calls
   this can be a problem...but I see it is possible to request an
 increase

 Write data parallel in DB: Task Queue with rate every 30second could
 be a solution
 (check timestamps in cache and write in DB)

 RESET counters = empty cache in Backends  reset counter of object in
 DB

 Backends cache = HashMap with shared counter values
 or
 counter values without sharding
 (just increment value in java hashmap is fast enough)

 With backends we don’t need sharding I thinkwhat do you think? Thx.

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: GAE - Vote counting system

2011-09-30 Thread Jeff Schnitzer
Assuming the goal is massive write throughput, you don't want to do 1
write per vote.  You need to aggregate writes into a batch - you can
do that with pull queues, but then you're limited to the maximum
throughput of a  pull queue.  And the biggest batch size is 1,000
which might actually be votes for 1,000 different things, which means
you're back to 1-vote-1-write.

Peter, you can certainly build a system whereby all vote totals are
tracked in RAM in a backend but now you're putting higher memory
requirements on the backend and life gets more complicated when you
deal with sharding.

Depending on how exact you need the counts to be, you can always use
increment() on memcache in addition to incrementing the backend.  The
only catch is that bootstrapping the initial memcache value will be a
little tricky - you'll need to use CAS and query the backends for any
existing deltas.  Or just not care if it's off by a few.

Jeff

On Thu, Sep 29, 2011 at 5:56 AM, Mat Jaggard matjagg...@gmail.com wrote:
 Will that end up being cheaper than the following?

 Put a new entity in the datastore for each vote - Kind: Vote, ID:
 Auto generated, Vote for: Item S
 Have a task queue query all the votes and delete them then write the
 count of votes to a global object.

 Cost = 1 datastore read + 1 datastore write + some fairly minor task
 processing per vote.

 On Sep 29, 1:47 pm, Peter Dev dev133...@gmail.com wrote:
 Price:
 - with backends lets say 3 B2 machines = 350USD/Month
 - UrlFetch Data Sent/Received                         0,15USD/GB

 Limit:
 - URL Fetch Daily Limit 46,000,000 calls
   this can be a problem...but I see it is possible to request an
 increase

 Write data parallel in DB: Task Queue with rate every 30second could
 be a solution
 (check timestamps in cache and write in DB)

 RESET counters = empty cache in Backends  reset counter of object in
 DB

 Backends cache = HashMap with shared counter values
 or
 counter values without sharding
 (just increment value in java hashmap is fast enough)

 With backends we don’t need sharding I thinkwhat do you think? Thx.

 --
 You received this message because you are subscribed to the Google Groups 
 Google App Engine for Java group.
 To post to this group, send email to google-appengine-java@googlegroups.com.
 To unsubscribe from this group, send email to 
 google-appengine-java+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/google-appengine-java?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



[appengine-java] Re: GAE - Vote counting system

2011-09-29 Thread Peter Dev
Price:
- with backends lets say 3 B2 machines = 350USD/Month
- UrlFetch Data Sent/Received 0,15USD/GB

Limit:
- URL Fetch Daily Limit 46,000,000 calls
  this can be a problem...but I see it is possible to request an
increase

Write data parallel in DB: Task Queue with rate every 30second could
be a solution
(check timestamps in cache and write in DB)

RESET counters = empty cache in Backends  reset counter of object in
DB

Backends cache = HashMap with shared counter values
or
counter values without sharding
(just increment value in java hashmap is fast enough)

With backends we don’t need sharding I thinkwhat do you think? Thx.

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: GAE - Vote counting system

2011-09-29 Thread Jeff Schnitzer
It's hard to say here if we're talking about the same thing, but
here's how I would do it:

 * Updates go through to the backend, which stores write deltas in ram
(not the total).
 * Reads read-through memcache into the datastore.
 * The backend writes deltas to the datastore in batch, updating
memcache, then purging the delta from memory.
 * The backends can write once every 5 mins or 30s or however long
you're comfortable having read data be stale.  More freshness == more
datastore expense.  It's a simple dial.

This system is not limited by backend RAM since each backend stores
only deltas - you can probably run the smallest size.  It won't be
limited by read volume, which will come almost entirely from memcache.
 It will be limited by max request throughput on the backend.  Given
the update is practically a no-op, how many QPS can a single backend
serve?  That's your limit (times the number of backends running).

I think you'd be hard-pressed to find a better solution to this
problem on GAE.  It does require that reads be stale to with a
controlled bound, though.

Jeff

On Thu, Sep 29, 2011 at 5:47 AM, Peter Dev dev133...@gmail.com wrote:
 Price:
 - with backends lets say 3 B2 machines = 350USD/Month
 - UrlFetch Data Sent/Received                         0,15USD/GB

 Limit:
 - URL Fetch Daily Limit 46,000,000 calls
  this can be a problem...but I see it is possible to request an
 increase

 Write data parallel in DB: Task Queue with rate every 30second could
 be a solution
 (check timestamps in cache and write in DB)

 RESET counters = empty cache in Backends  reset counter of object in
 DB

 Backends cache = HashMap with shared counter values
 or
 counter values without sharding
 (just increment value in java hashmap is fast enough)

 With backends we don’t need sharding I thinkwhat do you think? Thx.

 --
 You received this message because you are subscribed to the Google Groups 
 Google App Engine for Java group.
 To post to this group, send email to google-appengine-java@googlegroups.com.
 To unsubscribe from this group, send email to 
 google-appengine-java+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/google-appengine-java?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



[appengine-java] Re: GAE - Vote counting system

2011-09-27 Thread jeffrey_t_b
Many writes to the same object will lead to db failures. You really
should consider sharding:  
http://code.google.com/appengine/articles/sharding_counters.html




On Sep 26, 12:41 am, Peter Dev dev133...@gmail.com wrote:
 We are developing an application, where users can vote for many
 objects.
 (for example, voting the best music video of the week)

 - This means, we have millions of possible objects to vote for, and
 millions of users

 To our best knowledge, after taking in consideration different
 options, the best (or the only) voting system is: Memcache+Bulk DB
 write in DB
 - If number of objects in Memcache will achieve a specified limit (for
 example 3000) then write in DB.

 The writing speed into DB is about 100/sec. This also means, if we
 would set the above mentioned 3000 objects, the writing would last for
 30 sec...

 The problem; during save in DB, voting must be blocked. In other
 words, if from many millions of objects, there are 3000 achieved voted
 objects, we need to write it into DB and it can happen too many times,
 and blocking the whole voting mechanism.

 If we do not block the voting whilst writing in DB, the result could
 be wrong number of votes from cache (see Workflow: step 3).

 Workflow:

 1. vote received
 2. find object in memcache
 3. if not found in memcache get from DB and put it in
 4. increment the number of votes of the object in memcache
 5. check object number in memcache
 6. if necessary, save in DB and empty memcache
 ...

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



[appengine-java] Re: GAE - Vote counting system

2011-09-27 Thread Peter Dev
Shared counter is cool and I use it... but if you have millions of
objects I cannot imagine how to manage them.
1 000 000 obj x 100 shards = 10 000 000 counters

1. How to reset them to 0 in specified periods?
2. How to set the shared sum for each object to show top 100 objects?
3. Too much DB API Calls (each vote makes write in DB)

Any ideas...? Thx

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



[appengine-java] Re: GAE - Vote counting system

2011-09-27 Thread Peter Dev
Sorry, 100 000 000 counters

On Sep 27, 4:53 pm, Peter Dev dev133...@gmail.com wrote:
 Shared counter is cool and I use it... but if you have millions of
 objects I cannot imagine how to manage them.1 000 000obj x 100 shards =10 000 
 000counters

 1. How to reset them to 0 in specified periods?
 2. How to set the shared sum for each object to show top 100 objects?
 3. Too much DB API Calls (each vote makes write in DB)

 Any ideas...? Thx

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: GAE - Vote counting system

2011-09-27 Thread Jeff Schnitzer
Yeah, messy.

I'd use a backend for this.  Possibly a set of backends if you need to
shard the data for write volume.  I'd use Memcache only to cache the
count reads.

The basic entity is just an id and a count.  An increment request goes
to a backend, which simply tracks the change.  A batch process goes
through and writes any changed counts to both datastore and memcache
(as an update; read increment write in a txn) every 5 minutes and
clears the memory count.

If write volume is too high for a single backend to handle, shard it
by thingId % number of shards.  You can change the shard count on the
fly this way.

All reads should be read from the memcache, read-through to the
datastore as necessary.

This should be able to handle any volume you want.  If a backend
crashes you'll lose its accumulated counts but I presume that's not a
big deal.

Jeff

On Tue, Sep 27, 2011 at 10:38 AM, Peter Dev dev133...@gmail.com wrote:
 Sorry, 100 000 000 counters

 On Sep 27, 4:53 pm, Peter Dev dev133...@gmail.com wrote:
 Shared counter is cool and I use it... but if you have millions of
 objects I cannot imagine how to manage them.1 000 000obj x 100 shards =10 
 000 000counters

 1. How to reset them to 0 in specified periods?
 2. How to set the shared sum for each object to show top 100 objects?
 3. Too much DB API Calls (each vote makes write in DB)

 Any ideas...? Thx

 --
 You received this message because you are subscribed to the Google Groups 
 Google App Engine for Java group.
 To post to this group, send email to google-appengine-java@googlegroups.com.
 To unsubscribe from this group, send email to 
 google-appengine-java+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/google-appengine-java?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.