Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Ikai L (Google)
Jeff, point taken, but the original poster has been asking for three
different requirements:

- requirement to do all writes synchronously
- sub-some-couple-hundred-millisecond writes
- 12k entities being written

This just won't scale well if it's common. Messaging users can be done
asynchronously, as can the portion crediting friends. I understand the
argument that you may want to do this during the lifecycle of the request so
the original user gets some kind of feedback backed by a strongly consistent
datastore, but this just isn't done. Feedback is usually faked out
optimistically, assuming that the writes will all be successful with some
cached layer being the only part of the stack being updated inside the
request. Thinking of worse case scenarios is a good thought exercise, but
it's also a bit too optimistic to design a product assuming all of a Users'
friends will play the game and engineer to meet that unrealistic
expectation. What are the standard and slightly non-standard use cases? I'd
probably look at a solution where I can store the data somewhere associated
with the original user for any users not already in the datastore, then
retrieve and generate a score for any of that User's friends on first
access. Facebook's developer admin tool has some pretty good statistics such
as bounce rate, block rate and invitation accept rate that can be used to
tune this design.

Slightly off topic, but we've been asked before if it was possible to
provide different levels of datastore consistency. In some cases I can see
the tradeoffs making sense.

On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer j...@infohazard.org wrote:

 On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) ika...@google.com
 wrote:
  My point wasn't necessarily that it wasn't possible. makePersistentAll
 does
  use a batch write, and there are definitely sites that can do 12,000+
 writes
  a second (and well above that), but I don't know of any that will attempt
 to
  do that in a single request. While it's an interesting thought exercise
 to
  see if BigTable can do it through App Engine's interface (hint: it can,
  globally, easily), I can't think of a single use case for a site to need
 to
  do this all the time and with the sub-second requirement. I think it's
  reasonable to ask why this design exists and why the requirements exist
 and
  rethink one or the other.

 It does seem to be a pretty extreme case, but it's not all that far
 fetched.  It's possible for a Facebook user to have 5,000 friends.
 Perhaps a user wants to message all 5k of them.

 I could actually use this right ability now.  I would like to add a
 game mechanic which, when you score some points, you also credit a
 portion of that to all of a user's friends.  Worst case scenario is a
 5,000 element read followed by a 5,000 element write.  I'm probably
 going to skip this mechanic for now because I can't afford it - even
 with the average 200 or so friends.  If I want it badly enough, I may
 ultimately need to move my scoring system offsite.

 Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine for Java group.
 To post to this group, send email to
 google-appengine-j...@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/google-appengine-java?hl=en.




-- 
Ikai Lan
Developer Programs Engineer, Google App Engine
http://googleappengine.blogspot.com | http://twitter.com/app_engine

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Jeff Schnitzer
I don't think the original poster had a requirement for synchronous
writes; he just didn't want to do the writes asynchronously because it
involved a lot more code.

I'm also perfectly fine with asynchronous writes and a very lax
interpretation of consistency.  I don't even mind writing extra code.
The thing I worry about is the feasibility of a heavy write load and
the total cost of it.

Unfortunately I really can't describe in detail what I want to do (I
normally laugh at this kind of secrecy, but in this case it's
warranted).  For the game mechanic I'm thinking about, the
average-case scenario is not very far from the worst-case scenario.
Just a little detail:

 * There is no requirement that all of a user's friends must be
playing the game or even have installed the app to receive points.
Welcome to the world of social gaming, you can play without even
without knowing it!
 * There are *lots* of FB users that have  1k friends.  Probably
millions.  More active FB users are likely to have more friends... and
more likely to use my app.
 * Points can be assigned to multiple layers, so the # of updates is
(layers * friends).
 * Tens of thousands of people play this game. It could become
hundreds of thousands very soon.  If I'm lucky, millions.

I would love to implement this game mechanic, but I just can't.
Asynchronous or not, it's *way* too expensive on appengine.  When it
comes time to implement this feature (and it's going to come, I can
see the winds blowing), I'm probably going to have to move my scoring
system out of appengine.  Which is a bit ironic, because one of the
main advantages of appengine is scalability.

I would *love* to see some sort of super-super-lax and
super-super-cheap consistency option for BigTable.  Or even an
alternative key/value datastore that simply works like a persistent
version of memcached.  Something that would let me sustain 10k
writes/sec without bankrupting me.

Jeff

On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) ika...@google.com wrote:
 Jeff, point taken, but the original poster has been asking for three
 different requirements:
 - requirement to do all writes synchronously
 - sub-some-couple-hundred-millisecond writes
 - 12k entities being written

 This just won't scale well if it's common. Messaging users can be done
 asynchronously, as can the portion crediting friends. I understand the
 argument that you may want to do this during the lifecycle of the request so
 the original user gets some kind of feedback backed by a strongly consistent
 datastore, but this just isn't done. Feedback is usually faked out
 optimistically, assuming that the writes will all be successful with some
 cached layer being the only part of the stack being updated inside the
 request. Thinking of worse case scenarios is a good thought exercise, but
 it's also a bit too optimistic to design a product assuming all of a Users'
 friends will play the game and engineer to meet that unrealistic
 expectation. What are the standard and slightly non-standard use cases? I'd
 probably look at a solution where I can store the data somewhere associated
 with the original user for any users not already in the datastore, then
 retrieve and generate a score for any of that User's friends on first
 access. Facebook's developer admin tool has some pretty good statistics such
 as bounce rate, block rate and invitation accept rate that can be used to
 tune this design.
 Slightly off topic, but we've been asked before if it was possible to
 provide different levels of datastore consistency. In some cases I can see
 the tradeoffs making sense.

 On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer j...@infohazard.org wrote:

 On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) ika...@google.com
 wrote:
  My point wasn't necessarily that it wasn't possible. makePersistentAll
  does
  use a batch write, and there are definitely sites that can do 12,000+
  writes
  a second (and well above that), but I don't know of any that will
  attempt to
  do that in a single request. While it's an interesting thought exercise
  to
  see if BigTable can do it through App Engine's interface (hint: it can,
  globally, easily), I can't think of a single use case for a site to need
  to
  do this all the time and with the sub-second requirement. I think it's
  reasonable to ask why this design exists and why the requirements exist
  and
  rethink one or the other.

 It does seem to be a pretty extreme case, but it's not all that far
 fetched.  It's possible for a Facebook user to have 5,000 friends.
 Perhaps a user wants to message all 5k of them.

 I could actually use this right ability now.  I would like to add a
 game mechanic which, when you score some points, you also credit a
 portion of that to all of a user's friends.  Worst case scenario is a
 5,000 element read followed by a 5,000 element write.  I'm probably
 going to skip this mechanic for now because I can't afford it - even
 with the average 200 or so friends.  If I 

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread margus pala
Hi

I wrote importing geoip database in JSF. It has like 100k entries. Besides
appengine and datastore performance is awful and importing took around 1,5h
of total CPU its fairly easy to use TaskQueue. If there is above average
processing to be done then i suggest separating task into smaller batches
and process them asynchronously.

Margus

On Thu, Feb 25, 2010 at 3:52 AM, Jeff Schnitzer j...@infohazard.org wrote:

 On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) ika...@google.com
 wrote:
  My point wasn't necessarily that it wasn't possible. makePersistentAll
 does
  use a batch write, and there are definitely sites that can do 12,000+
 writes
  a second (and well above that), but I don't know of any that will attempt
 to
  do that in a single request. While it's an interesting thought exercise
 to
  see if BigTable can do it through App Engine's interface (hint: it can,
  globally, easily), I can't think of a single use case for a site to need
 to
  do this all the time and with the sub-second requirement. I think it's
  reasonable to ask why this design exists and why the requirements exist
 and
  rethink one or the other.

 It does seem to be a pretty extreme case, but it's not all that far
 fetched.  It's possible for a Facebook user to have 5,000 friends.
 Perhaps a user wants to message all 5k of them.

 I could actually use this right ability now.  I would like to add a
 game mechanic which, when you score some points, you also credit a
 portion of that to all of a user's friends.  Worst case scenario is a
 5,000 element read followed by a 5,000 element write.  I'm probably
 going to skip this mechanic for now because I can't afford it - even
 with the average 200 or so friends.  If I want it badly enough, I may
 ultimately need to move my scoring system offsite.

 Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine for Java group.
 To post to this group, send email to
 google-appengine-j...@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/google-appengine-java?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Scott Hernandez
Guillermo,

Taskqueue items can only be 10K
(http://code.google.com/appengine/docs/java/taskqueue/overview.html#Quotas_and_Limits).
The basic idea is that if you have more data than that you put it into
an entity (in the data-store) and have the task pull it out and
process it. It might be that you can persist those 12K entities in a
lot of large entities (that are still under 1mByte each), but that is
a lot of work for something that will still probably fail. I guess it
all depends where the cost is on the puts (indexing, raw writes by
bytes, number of items).

And if your comeback is memcache, well, I won't even start
discussing using a non-persistent, volatile, store like that for
temporary storage while you write them to the datastore... in batches
using the taskqueue/cron/etc.

Really, there needs to be something that can handle the write volume.
On Thu, Feb 25, 2010 at 12:08 PM, Guillermo Schwarz
guillermo.schw...@gmail.com wrote:
 I think there is a way to grab big chunks of oprations, put them in a
 queue to be done asynchronously and that would be it.

 My take would be that using proxies it would be easy to queue any long
 operation transparently. I've done that with EJBs in the past, I don't
 see the reason a QueingProxy couldn't be written.

 Cheers,
 Guillermo.

 On 25 feb, 17:02, Jeff Schnitzer j...@infohazard.org wrote:
 I don't think the original poster had a requirement for synchronous
 writes; he just didn't want to do the writes asynchronously because it
 involved a lot more code.

 I'm also perfectly fine with asynchronous writes and a very lax
 interpretation of consistency.  I don't even mind writing extra code.
 The thing I worry about is the feasibility of a heavy write load and
 the total cost of it.

 Unfortunately I really can't describe in detail what I want to do (I
 normally laugh at this kind of secrecy, but in this case it's
 warranted).  For the game mechanic I'm thinking about, the
 average-case scenario is not very far from the worst-case scenario.
 Just a little detail:

  * There is no requirement that all of a user's friends must be
 playing the game or even have installed the app to receive points.
 Welcome to the world of social gaming, you can play without even
 without knowing it!
  * There are *lots* of FB users that have  1k friends.  Probably
 millions.  More active FB users are likely to have more friends... and
 more likely to use my app.
  * Points can be assigned to multiple layers, so the # of updates is
 (layers * friends).
  * Tens of thousands of people play this game. It could become
 hundreds of thousands very soon.  If I'm lucky, millions.

 I would love to implement this game mechanic, but I just can't.
 Asynchronous or not, it's *way* too expensive on appengine.  When it
 comes time to implement this feature (and it's going to come, I can
 see the winds blowing), I'm probably going to have to move my scoring
 system out of appengine.  Which is a bit ironic, because one of the
 main advantages of appengine is scalability.

 I would *love* to see some sort of super-super-lax and
 super-super-cheap consistency option for BigTable.  Or even an
 alternative key/value datastore that simply works like a persistent
 version of memcached.  Something that would let me sustain 10k
 writes/sec without bankrupting me.

 Jeff

 On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) ika...@google.com wrote:

  Jeff, point taken, but the original poster has been asking for three
  different requirements:
  - requirement to do all writes synchronously
  - sub-some-couple-hundred-millisecond writes
  - 12k entities being written

  This just won't scale well if it's common. Messaging users can be done
  asynchronously, as can the portion crediting friends. I understand the
  argument that you may want to do this during the lifecycle of the request 
  so
  the original user gets some kind of feedback backed by a strongly 
  consistent
  datastore, but this just isn't done. Feedback is usually faked out
  optimistically, assuming that the writes will all be successful with some
  cached layer being the only part of the stack being updated inside the
  request. Thinking of worse case scenarios is a good thought exercise, but
  it's also a bit too optimistic to design a product assuming all of a Users'
  friends will play the game and engineer to meet that unrealistic
  expectation. What are the standard and slightly non-standard use cases? I'd
  probably look at a solution where I can store the data somewhere associated
  with the original user for any users not already in the datastore, then
  retrieve and generate a score for any of that User's friends on first
  access. Facebook's developer admin tool has some pretty good statistics 
  such
  as bounce rate, block rate and invitation accept rate that can be used to
  tune this design.
  Slightly off topic, but we've been asked before if it was possible to
  provide different levels of datastore consistency. In some cases 

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Ikai L (Google)
We have an issue for an asynchronous write API for the datastore:

http://code.google.com/p/googleappengine/issues/detail?id=2817

This is something that can fit into that model.

On Thu, Feb 25, 2010 at 12:26 PM, Scott Hernandez
scotthernan...@gmail.comwrote:

 Guillermo,

 Taskqueue items can only be 10K
 (
 http://code.google.com/appengine/docs/java/taskqueue/overview.html#Quotas_and_Limits
 ).
 The basic idea is that if you have more data than that you put it into
 an entity (in the data-store) and have the task pull it out and
 process it. It might be that you can persist those 12K entities in a
 lot of large entities (that are still under 1mByte each), but that is
 a lot of work for something that will still probably fail. I guess it
 all depends where the cost is on the puts (indexing, raw writes by
 bytes, number of items).

 And if your comeback is memcache, well, I won't even start
 discussing using a non-persistent, volatile, store like that for
 temporary storage while you write them to the datastore... in batches
 using the taskqueue/cron/etc.

 Really, there needs to be something that can handle the write volume.
 On Thu, Feb 25, 2010 at 12:08 PM, Guillermo Schwarz
 guillermo.schw...@gmail.com wrote:
  I think there is a way to grab big chunks of oprations, put them in a
  queue to be done asynchronously and that would be it.
 
  My take would be that using proxies it would be easy to queue any long
  operation transparently. I've done that with EJBs in the past, I don't
  see the reason a QueingProxy couldn't be written.
 
  Cheers,
  Guillermo.
 
  On 25 feb, 17:02, Jeff Schnitzer j...@infohazard.org wrote:
  I don't think the original poster had a requirement for synchronous
  writes; he just didn't want to do the writes asynchronously because it
  involved a lot more code.
 
  I'm also perfectly fine with asynchronous writes and a very lax
  interpretation of consistency.  I don't even mind writing extra code.
  The thing I worry about is the feasibility of a heavy write load and
  the total cost of it.
 
  Unfortunately I really can't describe in detail what I want to do (I
  normally laugh at this kind of secrecy, but in this case it's
  warranted).  For the game mechanic I'm thinking about, the
  average-case scenario is not very far from the worst-case scenario.
  Just a little detail:
 
   * There is no requirement that all of a user's friends must be
  playing the game or even have installed the app to receive points.
  Welcome to the world of social gaming, you can play without even
  without knowing it!
   * There are *lots* of FB users that have  1k friends.  Probably
  millions.  More active FB users are likely to have more friends... and
  more likely to use my app.
   * Points can be assigned to multiple layers, so the # of updates is
  (layers * friends).
   * Tens of thousands of people play this game. It could become
  hundreds of thousands very soon.  If I'm lucky, millions.
 
  I would love to implement this game mechanic, but I just can't.
  Asynchronous or not, it's *way* too expensive on appengine.  When it
  comes time to implement this feature (and it's going to come, I can
  see the winds blowing), I'm probably going to have to move my scoring
  system out of appengine.  Which is a bit ironic, because one of the
  main advantages of appengine is scalability.
 
  I would *love* to see some sort of super-super-lax and
  super-super-cheap consistency option for BigTable.  Or even an
  alternative key/value datastore that simply works like a persistent
  version of memcached.  Something that would let me sustain 10k
  writes/sec without bankrupting me.
 
  Jeff
 
  On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) ika...@google.com
 wrote:
 
   Jeff, point taken, but the original poster has been asking for three
   different requirements:
   - requirement to do all writes synchronously
   - sub-some-couple-hundred-millisecond writes
   - 12k entities being written
 
   This just won't scale well if it's common. Messaging users can be done
   asynchronously, as can the portion crediting friends. I understand the
   argument that you may want to do this during the lifecycle of the
 request so
   the original user gets some kind of feedback backed by a strongly
 consistent
   datastore, but this just isn't done. Feedback is usually faked out
   optimistically, assuming that the writes will all be successful with
 some
   cached layer being the only part of the stack being updated inside the
   request. Thinking of worse case scenarios is a good thought exercise,
 but
   it's also a bit too optimistic to design a product assuming all of a
 Users'
   friends will play the game and engineer to meet that unrealistic
   expectation. What are the standard and slightly non-standard use
 cases? I'd
   probably look at a solution where I can store the data somewhere
 associated
   with the original user for any users not already in the datastore,
 then
   retrieve and generate a 

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Ikai L (Google)
Simple key-only writes can definitely do it, but there's a few places where
you can introduce overhead:

- serialization
- network I/O
- indexes

My point wasn't necessarily that it wasn't possible. makePersistentAll does
use a batch write, and there are definitely sites that can do 12,000+ writes
a second (and well above that), but I don't know of any that will attempt to
do that in a single request. While it's an interesting thought exercise to
see if BigTable can do it through App Engine's interface (hint: it can,
globally, easily), I can't think of a single use case for a site to need to
do this all the time and with the sub-second requirement. I think it's
reasonable to ask why this design exists and why the requirements exist and
rethink one or the other.

On Wed, Feb 24, 2010 at 12:35 PM, Guillermo Schwarz 
guillermo.schw...@gmail.com wrote:

 Ikai,

 Maybe you are right. Maybe not. I'm not an expert in datastore
 internals, but here is my point of view.

 This paper claims that Berkeley DB Java edition can insert about
 15,000 records per second.

 http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf

 The graphic is on page 22. The main reason they claim to be able to do
 that is that they don't need to actually sync the write to disk, they
 can queue the write, update in-memory data and write a log file.
 Writing the log file is for transactional purposes and it is the only
 write really needed.That is pretty fast.

 Cheers,
 Guillermo.

 On 24 feb, 16:51, Ikai L (Google) ika...@google.com wrote:
  I also remember hearing (and this is not verified so don't quote me on
 this
  or come after me if I'm wrong) from a friend of mine running KV stores in
  production that there were issues with certain distributed key/value
 stores
  that actually managed to slow down as a function of the number of objects
 in
  the store - and Tokyo Tyrant was on his list. A key property of scalable
  stores is that the opposite of this is true.
 
  12,000 synchronous, serialized writes in a single sub-second request is
  pretty serious. I am not aware of a single website in the world that does
  this.
 
  On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer j...@infohazard.org
 wrote:
 
 
 
   I think this is actually an interesting question, and brings up a
   discussion worth having:
 
   Is datastore performance reasonable?
 
   I don't want to make this a discussion of reliability, which is a
   separate issue.  It just seems to me that the datastore is actually
   kinda pokey, taking seconds to write a few hundred entities.  When
   people benchmark Tokyo Tyrant, I hear numbers thrown around like
   22,000 writes/second sustained across 1M records:
 
  http://blog.hunch.se/2009/02/28-tokyo-cabinet
 
   You might argue that the theoretical scalability of BigTable's
   distributed store is higher... but we're talking about two full orders
   of magnitude difference.  Will I ever near the 100-google-server
   equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
   running for 1 month is about $7,200.  Actual CPU speed is at least
   twice the measured rate, so a single Tokyo Tyrant is theoretically
   equivalent to almost $15,000/month of appengine hosting.  Ouch.
 
   Maybe this isn't an apples to apples comparison.  Sure, there aren't
   extra indexes on those Tyrant entities... but to be honest, few of my
   entities have extra indexes.  What other factors could change this
   analysis?
 
   Thoughts?
 
   BTW Tim, you may very well have quite a few indexes on your entities.
   In JDO, nearly all single fields are indexed by default.  You must
   explicitly add an annotation to your fields to make them unindexed.
   With Objectify, you can declare your entity as @Indexed or @Unindexed
   and then use the same annotation on individual fields to override the
   default.
 
   Jeff
 
   On Wed, Feb 24, 2010 at 12:43 AM, Tim Cooper tco...@gmail.com wrote:
I have been trying to write 12,000 objects in a single page request.
These objects are all very small and the total amount of memory is
 not
large.  There is no index on these objects - the only GQL queries I
make on them are based on the primary key.
 
Ikai has said:  That is - if you have to delete or create 150
persistent, indexed objects, you may want to rethink what problems
 you
are trying to solve.
 
So I have been thinking about the problems I'm trying to solve,
including looking at the BuddyPoke blog and reading the GAE
documentation.  I'm trying to populate the database with entries
relating to high school timetables.
 
* I could do the writes asynchronously, but that looks like a lot of
additional effort. On my C++ app, writing the same information to my
laptop drive, this happens in under a second, because the amount of
data is actually quite small, but it times out on GAE.
* I am using pm.makePersistentAll(), but this doesn't help.
* There is no index on