Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Ikai L (Google)
Jeff, point taken, but the original poster has been asking for three
different requirements:

- requirement to do all writes synchronously
- sub-some-couple-hundred-millisecond writes
- 12k entities being written

This just won't scale well if it's common. Messaging users can be done
asynchronously, as can the portion crediting friends. I understand the
argument that you may want to do this during the lifecycle of the request so
the original user gets some kind of feedback backed by a strongly consistent
datastore, but this just isn't done. Feedback is usually faked out
optimistically, assuming that the writes will all be successful with some
cached layer being the only part of the stack being updated inside the
request. Thinking of worse case scenarios is a good thought exercise, but
it's also a bit too optimistic to design a product assuming all of a Users'
friends will play the game and engineer to meet that unrealistic
expectation. What are the standard and slightly non-standard use cases? I'd
probably look at a solution where I can store the data somewhere associated
with the original user for any users not already in the datastore, then
retrieve and generate a score for any of that User's friends on first
access. Facebook's developer admin tool has some pretty good statistics such
as bounce rate, block rate and invitation accept rate that can be used to
tune this design.

Slightly off topic, but we've been asked before if it was possible to
provide different levels of datastore consistency. In some cases I can see
the tradeoffs making sense.

On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer j...@infohazard.org wrote:

 On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) ika...@google.com
 wrote:
  My point wasn't necessarily that it wasn't possible. makePersistentAll
 does
  use a batch write, and there are definitely sites that can do 12,000+
 writes
  a second (and well above that), but I don't know of any that will attempt
 to
  do that in a single request. While it's an interesting thought exercise
 to
  see if BigTable can do it through App Engine's interface (hint: it can,
  globally, easily), I can't think of a single use case for a site to need
 to
  do this all the time and with the sub-second requirement. I think it's
  reasonable to ask why this design exists and why the requirements exist
 and
  rethink one or the other.

 It does seem to be a pretty extreme case, but it's not all that far
 fetched.  It's possible for a Facebook user to have 5,000 friends.
 Perhaps a user wants to message all 5k of them.

 I could actually use this right ability now.  I would like to add a
 game mechanic which, when you score some points, you also credit a
 portion of that to all of a user's friends.  Worst case scenario is a
 5,000 element read followed by a 5,000 element write.  I'm probably
 going to skip this mechanic for now because I can't afford it - even
 with the average 200 or so friends.  If I want it badly enough, I may
 ultimately need to move my scoring system offsite.

 Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine for Java group.
 To post to this group, send email to
 google-appengine-j...@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/google-appengine-java?hl=en.




-- 
Ikai Lan
Developer Programs Engineer, Google App Engine
http://googleappengine.blogspot.com | http://twitter.com/app_engine

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Jeff Schnitzer
I don't think the original poster had a requirement for synchronous
writes; he just didn't want to do the writes asynchronously because it
involved a lot more code.

I'm also perfectly fine with asynchronous writes and a very lax
interpretation of consistency.  I don't even mind writing extra code.
The thing I worry about is the feasibility of a heavy write load and
the total cost of it.

Unfortunately I really can't describe in detail what I want to do (I
normally laugh at this kind of secrecy, but in this case it's
warranted).  For the game mechanic I'm thinking about, the
average-case scenario is not very far from the worst-case scenario.
Just a little detail:

 * There is no requirement that all of a user's friends must be
playing the game or even have installed the app to receive points.
Welcome to the world of social gaming, you can play without even
without knowing it!
 * There are *lots* of FB users that have  1k friends.  Probably
millions.  More active FB users are likely to have more friends... and
more likely to use my app.
 * Points can be assigned to multiple layers, so the # of updates is
(layers * friends).
 * Tens of thousands of people play this game. It could become
hundreds of thousands very soon.  If I'm lucky, millions.

I would love to implement this game mechanic, but I just can't.
Asynchronous or not, it's *way* too expensive on appengine.  When it
comes time to implement this feature (and it's going to come, I can
see the winds blowing), I'm probably going to have to move my scoring
system out of appengine.  Which is a bit ironic, because one of the
main advantages of appengine is scalability.

I would *love* to see some sort of super-super-lax and
super-super-cheap consistency option for BigTable.  Or even an
alternative key/value datastore that simply works like a persistent
version of memcached.  Something that would let me sustain 10k
writes/sec without bankrupting me.

Jeff

On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) ika...@google.com wrote:
 Jeff, point taken, but the original poster has been asking for three
 different requirements:
 - requirement to do all writes synchronously
 - sub-some-couple-hundred-millisecond writes
 - 12k entities being written

 This just won't scale well if it's common. Messaging users can be done
 asynchronously, as can the portion crediting friends. I understand the
 argument that you may want to do this during the lifecycle of the request so
 the original user gets some kind of feedback backed by a strongly consistent
 datastore, but this just isn't done. Feedback is usually faked out
 optimistically, assuming that the writes will all be successful with some
 cached layer being the only part of the stack being updated inside the
 request. Thinking of worse case scenarios is a good thought exercise, but
 it's also a bit too optimistic to design a product assuming all of a Users'
 friends will play the game and engineer to meet that unrealistic
 expectation. What are the standard and slightly non-standard use cases? I'd
 probably look at a solution where I can store the data somewhere associated
 with the original user for any users not already in the datastore, then
 retrieve and generate a score for any of that User's friends on first
 access. Facebook's developer admin tool has some pretty good statistics such
 as bounce rate, block rate and invitation accept rate that can be used to
 tune this design.
 Slightly off topic, but we've been asked before if it was possible to
 provide different levels of datastore consistency. In some cases I can see
 the tradeoffs making sense.

 On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer j...@infohazard.org wrote:

 On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) ika...@google.com
 wrote:
  My point wasn't necessarily that it wasn't possible. makePersistentAll
  does
  use a batch write, and there are definitely sites that can do 12,000+
  writes
  a second (and well above that), but I don't know of any that will
  attempt to
  do that in a single request. While it's an interesting thought exercise
  to
  see if BigTable can do it through App Engine's interface (hint: it can,
  globally, easily), I can't think of a single use case for a site to need
  to
  do this all the time and with the sub-second requirement. I think it's
  reasonable to ask why this design exists and why the requirements exist
  and
  rethink one or the other.

 It does seem to be a pretty extreme case, but it's not all that far
 fetched.  It's possible for a Facebook user to have 5,000 friends.
 Perhaps a user wants to message all 5k of them.

 I could actually use this right ability now.  I would like to add a
 game mechanic which, when you score some points, you also credit a
 portion of that to all of a user's friends.  Worst case scenario is a
 5,000 element read followed by a 5,000 element write.  I'm probably
 going to skip this mechanic for now because I can't afford it - even
 with the average 200 or so friends.  If I 

[appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Guillermo Schwarz
I think there is a way to grab big chunks of oprations, put them in a
queue to be done asynchronously and that would be it.

My take would be that using proxies it would be easy to queue any long
operation transparently. I've done that with EJBs in the past, I don't
see the reason a QueingProxy couldn't be written.

Cheers,
Guillermo.

On 25 feb, 17:02, Jeff Schnitzer j...@infohazard.org wrote:
 I don't think the original poster had a requirement for synchronous
 writes; he just didn't want to do the writes asynchronously because it
 involved a lot more code.

 I'm also perfectly fine with asynchronous writes and a very lax
 interpretation of consistency.  I don't even mind writing extra code.
 The thing I worry about is the feasibility of a heavy write load and
 the total cost of it.

 Unfortunately I really can't describe in detail what I want to do (I
 normally laugh at this kind of secrecy, but in this case it's
 warranted).  For the game mechanic I'm thinking about, the
 average-case scenario is not very far from the worst-case scenario.
 Just a little detail:

  * There is no requirement that all of a user's friends must be
 playing the game or even have installed the app to receive points.
 Welcome to the world of social gaming, you can play without even
 without knowing it!
  * There are *lots* of FB users that have  1k friends.  Probably
 millions.  More active FB users are likely to have more friends... and
 more likely to use my app.
  * Points can be assigned to multiple layers, so the # of updates is
 (layers * friends).
  * Tens of thousands of people play this game. It could become
 hundreds of thousands very soon.  If I'm lucky, millions.

 I would love to implement this game mechanic, but I just can't.
 Asynchronous or not, it's *way* too expensive on appengine.  When it
 comes time to implement this feature (and it's going to come, I can
 see the winds blowing), I'm probably going to have to move my scoring
 system out of appengine.  Which is a bit ironic, because one of the
 main advantages of appengine is scalability.

 I would *love* to see some sort of super-super-lax and
 super-super-cheap consistency option for BigTable.  Or even an
 alternative key/value datastore that simply works like a persistent
 version of memcached.  Something that would let me sustain 10k
 writes/sec without bankrupting me.

 Jeff

 On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) ika...@google.com wrote:

  Jeff, point taken, but the original poster has been asking for three
  different requirements:
  - requirement to do all writes synchronously
  - sub-some-couple-hundred-millisecond writes
  - 12k entities being written

  This just won't scale well if it's common. Messaging users can be done
  asynchronously, as can the portion crediting friends. I understand the
  argument that you may want to do this during the lifecycle of the request so
  the original user gets some kind of feedback backed by a strongly consistent
  datastore, but this just isn't done. Feedback is usually faked out
  optimistically, assuming that the writes will all be successful with some
  cached layer being the only part of the stack being updated inside the
  request. Thinking of worse case scenarios is a good thought exercise, but
  it's also a bit too optimistic to design a product assuming all of a Users'
  friends will play the game and engineer to meet that unrealistic
  expectation. What are the standard and slightly non-standard use cases? I'd
  probably look at a solution where I can store the data somewhere associated
  with the original user for any users not already in the datastore, then
  retrieve and generate a score for any of that User's friends on first
  access. Facebook's developer admin tool has some pretty good statistics such
  as bounce rate, block rate and invitation accept rate that can be used to
  tune this design.
  Slightly off topic, but we've been asked before if it was possible to
  provide different levels of datastore consistency. In some cases I can see
  the tradeoffs making sense.

  On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer j...@infohazard.org wrote:

  On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) ika...@google.com
  wrote:
   My point wasn't necessarily that it wasn't possible. makePersistentAll
   does
   use a batch write, and there are definitely sites that can do 12,000+
   writes
   a second (and well above that), but I don't know of any that will
   attempt to
   do that in a single request. While it's an interesting thought exercise
   to
   see if BigTable can do it through App Engine's interface (hint: it can,
   globally, easily), I can't think of a single use case for a site to need
   to
   do this all the time and with the sub-second requirement. I think it's
   reasonable to ask why this design exists and why the requirements exist
   and
   rethink one or the other.

  It does seem to be a pretty extreme case, but it's not all that far
  fetched.  It's possible 

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread margus pala
Hi

I wrote importing geoip database in JSF. It has like 100k entries. Besides
appengine and datastore performance is awful and importing took around 1,5h
of total CPU its fairly easy to use TaskQueue. If there is above average
processing to be done then i suggest separating task into smaller batches
and process them asynchronously.

Margus

On Thu, Feb 25, 2010 at 3:52 AM, Jeff Schnitzer j...@infohazard.org wrote:

 On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) ika...@google.com
 wrote:
  My point wasn't necessarily that it wasn't possible. makePersistentAll
 does
  use a batch write, and there are definitely sites that can do 12,000+
 writes
  a second (and well above that), but I don't know of any that will attempt
 to
  do that in a single request. While it's an interesting thought exercise
 to
  see if BigTable can do it through App Engine's interface (hint: it can,
  globally, easily), I can't think of a single use case for a site to need
 to
  do this all the time and with the sub-second requirement. I think it's
  reasonable to ask why this design exists and why the requirements exist
 and
  rethink one or the other.

 It does seem to be a pretty extreme case, but it's not all that far
 fetched.  It's possible for a Facebook user to have 5,000 friends.
 Perhaps a user wants to message all 5k of them.

 I could actually use this right ability now.  I would like to add a
 game mechanic which, when you score some points, you also credit a
 portion of that to all of a user's friends.  Worst case scenario is a
 5,000 element read followed by a 5,000 element write.  I'm probably
 going to skip this mechanic for now because I can't afford it - even
 with the average 200 or so friends.  If I want it badly enough, I may
 ultimately need to move my scoring system offsite.

 Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine for Java group.
 To post to this group, send email to
 google-appengine-j...@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/google-appengine-java?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Scott Hernandez
Guillermo,

Taskqueue items can only be 10K
(http://code.google.com/appengine/docs/java/taskqueue/overview.html#Quotas_and_Limits).
The basic idea is that if you have more data than that you put it into
an entity (in the data-store) and have the task pull it out and
process it. It might be that you can persist those 12K entities in a
lot of large entities (that are still under 1mByte each), but that is
a lot of work for something that will still probably fail. I guess it
all depends where the cost is on the puts (indexing, raw writes by
bytes, number of items).

And if your comeback is memcache, well, I won't even start
discussing using a non-persistent, volatile, store like that for
temporary storage while you write them to the datastore... in batches
using the taskqueue/cron/etc.

Really, there needs to be something that can handle the write volume.
On Thu, Feb 25, 2010 at 12:08 PM, Guillermo Schwarz
guillermo.schw...@gmail.com wrote:
 I think there is a way to grab big chunks of oprations, put them in a
 queue to be done asynchronously and that would be it.

 My take would be that using proxies it would be easy to queue any long
 operation transparently. I've done that with EJBs in the past, I don't
 see the reason a QueingProxy couldn't be written.

 Cheers,
 Guillermo.

 On 25 feb, 17:02, Jeff Schnitzer j...@infohazard.org wrote:
 I don't think the original poster had a requirement for synchronous
 writes; he just didn't want to do the writes asynchronously because it
 involved a lot more code.

 I'm also perfectly fine with asynchronous writes and a very lax
 interpretation of consistency.  I don't even mind writing extra code.
 The thing I worry about is the feasibility of a heavy write load and
 the total cost of it.

 Unfortunately I really can't describe in detail what I want to do (I
 normally laugh at this kind of secrecy, but in this case it's
 warranted).  For the game mechanic I'm thinking about, the
 average-case scenario is not very far from the worst-case scenario.
 Just a little detail:

  * There is no requirement that all of a user's friends must be
 playing the game or even have installed the app to receive points.
 Welcome to the world of social gaming, you can play without even
 without knowing it!
  * There are *lots* of FB users that have  1k friends.  Probably
 millions.  More active FB users are likely to have more friends... and
 more likely to use my app.
  * Points can be assigned to multiple layers, so the # of updates is
 (layers * friends).
  * Tens of thousands of people play this game. It could become
 hundreds of thousands very soon.  If I'm lucky, millions.

 I would love to implement this game mechanic, but I just can't.
 Asynchronous or not, it's *way* too expensive on appengine.  When it
 comes time to implement this feature (and it's going to come, I can
 see the winds blowing), I'm probably going to have to move my scoring
 system out of appengine.  Which is a bit ironic, because one of the
 main advantages of appengine is scalability.

 I would *love* to see some sort of super-super-lax and
 super-super-cheap consistency option for BigTable.  Or even an
 alternative key/value datastore that simply works like a persistent
 version of memcached.  Something that would let me sustain 10k
 writes/sec without bankrupting me.

 Jeff

 On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) ika...@google.com wrote:

  Jeff, point taken, but the original poster has been asking for three
  different requirements:
  - requirement to do all writes synchronously
  - sub-some-couple-hundred-millisecond writes
  - 12k entities being written

  This just won't scale well if it's common. Messaging users can be done
  asynchronously, as can the portion crediting friends. I understand the
  argument that you may want to do this during the lifecycle of the request 
  so
  the original user gets some kind of feedback backed by a strongly 
  consistent
  datastore, but this just isn't done. Feedback is usually faked out
  optimistically, assuming that the writes will all be successful with some
  cached layer being the only part of the stack being updated inside the
  request. Thinking of worse case scenarios is a good thought exercise, but
  it's also a bit too optimistic to design a product assuming all of a Users'
  friends will play the game and engineer to meet that unrealistic
  expectation. What are the standard and slightly non-standard use cases? I'd
  probably look at a solution where I can store the data somewhere associated
  with the original user for any users not already in the datastore, then
  retrieve and generate a score for any of that User's friends on first
  access. Facebook's developer admin tool has some pretty good statistics 
  such
  as bounce rate, block rate and invitation accept rate that can be used to
  tune this design.
  Slightly off topic, but we've been asked before if it was possible to
  provide different levels of datastore consistency. In some cases 

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Ikai L (Google)
We have an issue for an asynchronous write API for the datastore:

http://code.google.com/p/googleappengine/issues/detail?id=2817

This is something that can fit into that model.

On Thu, Feb 25, 2010 at 12:26 PM, Scott Hernandez
scotthernan...@gmail.comwrote:

 Guillermo,

 Taskqueue items can only be 10K
 (
 http://code.google.com/appengine/docs/java/taskqueue/overview.html#Quotas_and_Limits
 ).
 The basic idea is that if you have more data than that you put it into
 an entity (in the data-store) and have the task pull it out and
 process it. It might be that you can persist those 12K entities in a
 lot of large entities (that are still under 1mByte each), but that is
 a lot of work for something that will still probably fail. I guess it
 all depends where the cost is on the puts (indexing, raw writes by
 bytes, number of items).

 And if your comeback is memcache, well, I won't even start
 discussing using a non-persistent, volatile, store like that for
 temporary storage while you write them to the datastore... in batches
 using the taskqueue/cron/etc.

 Really, there needs to be something that can handle the write volume.
 On Thu, Feb 25, 2010 at 12:08 PM, Guillermo Schwarz
 guillermo.schw...@gmail.com wrote:
  I think there is a way to grab big chunks of oprations, put them in a
  queue to be done asynchronously and that would be it.
 
  My take would be that using proxies it would be easy to queue any long
  operation transparently. I've done that with EJBs in the past, I don't
  see the reason a QueingProxy couldn't be written.
 
  Cheers,
  Guillermo.
 
  On 25 feb, 17:02, Jeff Schnitzer j...@infohazard.org wrote:
  I don't think the original poster had a requirement for synchronous
  writes; he just didn't want to do the writes asynchronously because it
  involved a lot more code.
 
  I'm also perfectly fine with asynchronous writes and a very lax
  interpretation of consistency.  I don't even mind writing extra code.
  The thing I worry about is the feasibility of a heavy write load and
  the total cost of it.
 
  Unfortunately I really can't describe in detail what I want to do (I
  normally laugh at this kind of secrecy, but in this case it's
  warranted).  For the game mechanic I'm thinking about, the
  average-case scenario is not very far from the worst-case scenario.
  Just a little detail:
 
   * There is no requirement that all of a user's friends must be
  playing the game or even have installed the app to receive points.
  Welcome to the world of social gaming, you can play without even
  without knowing it!
   * There are *lots* of FB users that have  1k friends.  Probably
  millions.  More active FB users are likely to have more friends... and
  more likely to use my app.
   * Points can be assigned to multiple layers, so the # of updates is
  (layers * friends).
   * Tens of thousands of people play this game. It could become
  hundreds of thousands very soon.  If I'm lucky, millions.
 
  I would love to implement this game mechanic, but I just can't.
  Asynchronous or not, it's *way* too expensive on appengine.  When it
  comes time to implement this feature (and it's going to come, I can
  see the winds blowing), I'm probably going to have to move my scoring
  system out of appengine.  Which is a bit ironic, because one of the
  main advantages of appengine is scalability.
 
  I would *love* to see some sort of super-super-lax and
  super-super-cheap consistency option for BigTable.  Or even an
  alternative key/value datastore that simply works like a persistent
  version of memcached.  Something that would let me sustain 10k
  writes/sec without bankrupting me.
 
  Jeff
 
  On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) ika...@google.com
 wrote:
 
   Jeff, point taken, but the original poster has been asking for three
   different requirements:
   - requirement to do all writes synchronously
   - sub-some-couple-hundred-millisecond writes
   - 12k entities being written
 
   This just won't scale well if it's common. Messaging users can be done
   asynchronously, as can the portion crediting friends. I understand the
   argument that you may want to do this during the lifecycle of the
 request so
   the original user gets some kind of feedback backed by a strongly
 consistent
   datastore, but this just isn't done. Feedback is usually faked out
   optimistically, assuming that the writes will all be successful with
 some
   cached layer being the only part of the stack being updated inside the
   request. Thinking of worse case scenarios is a good thought exercise,
 but
   it's also a bit too optimistic to design a product assuming all of a
 Users'
   friends will play the game and engineer to meet that unrealistic
   expectation. What are the standard and slightly non-standard use
 cases? I'd
   probably look at a solution where I can store the data somewhere
 associated
   with the original user for any users not already in the datastore,
 then
   retrieve and generate a 

[appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Guillermo Schwarz
Ikai,

Maybe you are right. Maybe not. I'm not an expert in datastore
internals, but here is my point of view.

This paper claims that Berkeley DB Java edition can insert about
15,000 records per second.

http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf

The graphic is on page 22. The main reason they claim to be able to do
that is that they don't need to actually sync the write to disk, they
can queue the write, update in-memory data and write a log file.
Writing the log file is for transactional purposes and it is the only
write really needed.That is pretty fast.

Cheers,
Guillermo.

On 24 feb, 16:51, Ikai L (Google) ika...@google.com wrote:
 I also remember hearing (and this is not verified so don't quote me on this
 or come after me if I'm wrong) from a friend of mine running KV stores in
 production that there were issues with certain distributed key/value stores
 that actually managed to slow down as a function of the number of objects in
 the store - and Tokyo Tyrant was on his list. A key property of scalable
 stores is that the opposite of this is true.

 12,000 synchronous, serialized writes in a single sub-second request is
 pretty serious. I am not aware of a single website in the world that does
 this.

 On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer j...@infohazard.orgwrote:



  I think this is actually an interesting question, and brings up a
  discussion worth having:

  Is datastore performance reasonable?

  I don't want to make this a discussion of reliability, which is a
  separate issue.  It just seems to me that the datastore is actually
  kinda pokey, taking seconds to write a few hundred entities.  When
  people benchmark Tokyo Tyrant, I hear numbers thrown around like
  22,000 writes/second sustained across 1M records:

 http://blog.hunch.se/2009/02/28-tokyo-cabinet

  You might argue that the theoretical scalability of BigTable's
  distributed store is higher... but we're talking about two full orders
  of magnitude difference.  Will I ever near the 100-google-server
  equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
  running for 1 month is about $7,200.  Actual CPU speed is at least
  twice the measured rate, so a single Tokyo Tyrant is theoretically
  equivalent to almost $15,000/month of appengine hosting.  Ouch.

  Maybe this isn't an apples to apples comparison.  Sure, there aren't
  extra indexes on those Tyrant entities... but to be honest, few of my
  entities have extra indexes.  What other factors could change this
  analysis?

  Thoughts?

  BTW Tim, you may very well have quite a few indexes on your entities.
  In JDO, nearly all single fields are indexed by default.  You must
  explicitly add an annotation to your fields to make them unindexed.
  With Objectify, you can declare your entity as @Indexed or @Unindexed
  and then use the same annotation on individual fields to override the
  default.

  Jeff

  On Wed, Feb 24, 2010 at 12:43 AM, Tim Cooper tco...@gmail.com wrote:
   I have been trying to write 12,000 objects in a single page request.
   These objects are all very small and the total amount of memory is not
   large.  There is no index on these objects - the only GQL queries I
   make on them are based on the primary key.

   Ikai has said:  That is - if you have to delete or create 150
   persistent, indexed objects, you may want to rethink what problems you
   are trying to solve.

   So I have been thinking about the problems I'm trying to solve,
   including looking at the BuddyPoke blog and reading the GAE
   documentation.  I'm trying to populate the database with entries
   relating to high school timetables.

   * I could do the writes asynchronously, but that looks like a lot of
   additional effort. On my C++ app, writing the same information to my
   laptop drive, this happens in under a second, because the amount of
   data is actually quite small, but it times out on GAE.
   * I am using pm.makePersistentAll(), but this doesn't help.
   * There is no index on the objects - I access them only through the
   primary key.  (I'm pretty sure there's no index - but how can I
   confirm this via the development server dashboard?)
   * The objects constitute 12,000 entity groups.  I could merge them
   into fewer entity groups, but there's no natural groupings I could
   use, so it could get quite complex to introduce a contrived grouping,
   and also this would complicate the multi-user updating of the objects.
    The AppEngine team seem to generally recommend using more entity
   groups, but it's difficult to integrate that advice with the contrary
   advice to use fewer entity groups for acceptable performance.
   * I'd be happy if the GAE database was  10 times slower than a
   non-cloud RDBMS, but the way I'm using it, it's currently not.

   Does anyone have any advice?

   --
   You received this message because you are subscribed to the Google Groups
  Google App Engine for Java group.
   To post 

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Ikai L (Google)
Simple key-only writes can definitely do it, but there's a few places where
you can introduce overhead:

- serialization
- network I/O
- indexes

My point wasn't necessarily that it wasn't possible. makePersistentAll does
use a batch write, and there are definitely sites that can do 12,000+ writes
a second (and well above that), but I don't know of any that will attempt to
do that in a single request. While it's an interesting thought exercise to
see if BigTable can do it through App Engine's interface (hint: it can,
globally, easily), I can't think of a single use case for a site to need to
do this all the time and with the sub-second requirement. I think it's
reasonable to ask why this design exists and why the requirements exist and
rethink one or the other.

On Wed, Feb 24, 2010 at 12:35 PM, Guillermo Schwarz 
guillermo.schw...@gmail.com wrote:

 Ikai,

 Maybe you are right. Maybe not. I'm not an expert in datastore
 internals, but here is my point of view.

 This paper claims that Berkeley DB Java edition can insert about
 15,000 records per second.

 http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf

 The graphic is on page 22. The main reason they claim to be able to do
 that is that they don't need to actually sync the write to disk, they
 can queue the write, update in-memory data and write a log file.
 Writing the log file is for transactional purposes and it is the only
 write really needed.That is pretty fast.

 Cheers,
 Guillermo.

 On 24 feb, 16:51, Ikai L (Google) ika...@google.com wrote:
  I also remember hearing (and this is not verified so don't quote me on
 this
  or come after me if I'm wrong) from a friend of mine running KV stores in
  production that there were issues with certain distributed key/value
 stores
  that actually managed to slow down as a function of the number of objects
 in
  the store - and Tokyo Tyrant was on his list. A key property of scalable
  stores is that the opposite of this is true.
 
  12,000 synchronous, serialized writes in a single sub-second request is
  pretty serious. I am not aware of a single website in the world that does
  this.
 
  On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer j...@infohazard.org
 wrote:
 
 
 
   I think this is actually an interesting question, and brings up a
   discussion worth having:
 
   Is datastore performance reasonable?
 
   I don't want to make this a discussion of reliability, which is a
   separate issue.  It just seems to me that the datastore is actually
   kinda pokey, taking seconds to write a few hundred entities.  When
   people benchmark Tokyo Tyrant, I hear numbers thrown around like
   22,000 writes/second sustained across 1M records:
 
  http://blog.hunch.se/2009/02/28-tokyo-cabinet
 
   You might argue that the theoretical scalability of BigTable's
   distributed store is higher... but we're talking about two full orders
   of magnitude difference.  Will I ever near the 100-google-server
   equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
   running for 1 month is about $7,200.  Actual CPU speed is at least
   twice the measured rate, so a single Tokyo Tyrant is theoretically
   equivalent to almost $15,000/month of appengine hosting.  Ouch.
 
   Maybe this isn't an apples to apples comparison.  Sure, there aren't
   extra indexes on those Tyrant entities... but to be honest, few of my
   entities have extra indexes.  What other factors could change this
   analysis?
 
   Thoughts?
 
   BTW Tim, you may very well have quite a few indexes on your entities.
   In JDO, nearly all single fields are indexed by default.  You must
   explicitly add an annotation to your fields to make them unindexed.
   With Objectify, you can declare your entity as @Indexed or @Unindexed
   and then use the same annotation on individual fields to override the
   default.
 
   Jeff
 
   On Wed, Feb 24, 2010 at 12:43 AM, Tim Cooper tco...@gmail.com wrote:
I have been trying to write 12,000 objects in a single page request.
These objects are all very small and the total amount of memory is
 not
large.  There is no index on these objects - the only GQL queries I
make on them are based on the primary key.
 
Ikai has said:  That is - if you have to delete or create 150
persistent, indexed objects, you may want to rethink what problems
 you
are trying to solve.
 
So I have been thinking about the problems I'm trying to solve,
including looking at the BuddyPoke blog and reading the GAE
documentation.  I'm trying to populate the database with entries
relating to high school timetables.
 
* I could do the writes asynchronously, but that looks like a lot of
additional effort. On my C++ app, writing the same information to my
laptop drive, this happens in under a second, because the amount of
data is actually quite small, but it times out on GAE.
* I am using pm.makePersistentAll(), but this doesn't help.
* There is no index on 

[appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Larry Cable
my experience with a relatively simple application via JDO
makePersistentAll() was that I got
DataStore Operation Timeout exceptions with batch sizes of approx
200-300 objects ...

On Feb 24, 1:48 pm, Guillermo Schwarz guillermo.schw...@gmail.com
wrote:
 I think we can safely assume that the programmer was trying to speed
 up things a little by writing 12 thousand objects in a single
 operation.

 Now if that gets to be faster or slower than writing each object
 separately, it is a matter of the internal implementation of the data
 store. I prefer to do no hacks, but OTOH it is better sometimes to be
 clear bout what you want (API wise).

 The point here is that the programmer wants to insert 15 thousand
 objects in a second, you seem to imply that is possible.
 While it's an interesting thought exercise to see if BigTable can do
 it through App Engine's interface (hint: it can, globally, easily).

 I rest my case ;-)

 Do we need to do anything to test that? Is there anything we could do
 to help?

 Cheers,
 Guillermo.

 On 24 feb, 18:06, Ikai L (Google) ika...@google.com wrote:



  Simple key-only writes can definitely do it, but there's a few places where
  you can introduce overhead:

  - serialization
  - network I/O
  - indexes

  My point wasn't necessarily that it wasn't possible. makePersistentAll does
  use a batch write, and there are definitely sites that can do 12,000+ writes
  a second (and well above that), but I don't know of any that will attempt to
  do that in a single request. While it's an interesting thought exercise to
  see if BigTable can do it through App Engine's interface (hint: it can,
  globally, easily), I can't think of a single use case for a site to need to
  do this all the time and with the sub-second requirement. I think it's
  reasonable to ask why this design exists and why the requirements exist and
  rethink one or the other.

  On Wed, Feb 24, 2010 at 12:35 PM, Guillermo Schwarz 

  guillermo.schw...@gmail.com wrote:
   Ikai,

   Maybe you are right. Maybe not. I'm not an expert in datastore
   internals, but here is my point of view.

   This paper claims that Berkeley DB Java edition can insert about
   15,000 records per second.

  http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf

   The graphic is on page 22. The main reason they claim to be able to do
   that is that they don't need to actually sync the write to disk, they
   can queue the write, update in-memory data and write a log file.
   Writing the log file is for transactional purposes and it is the only
   write really needed.That is pretty fast.

   Cheers,
   Guillermo.

   On 24 feb, 16:51, Ikai L (Google) ika...@google.com wrote:
I also remember hearing (and this is not verified so don't quote me on
   this
or come after me if I'm wrong) from a friend of mine running KV stores 
in
production that there were issues with certain distributed key/value
   stores
that actually managed to slow down as a function of the number of 
objects
   in
the store - and Tokyo Tyrant was on his list. A key property of scalable
stores is that the opposite of this is true.

12,000 synchronous, serialized writes in a single sub-second request is
pretty serious. I am not aware of a single website in the world that 
does
this.

On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer j...@infohazard.org
   wrote:

 I think this is actually an interesting question, and brings up a
 discussion worth having:

 Is datastore performance reasonable?

 I don't want to make this a discussion of reliability, which is a
 separate issue.  It just seems to me that the datastore is actually
 kinda pokey, taking seconds to write a few hundred entities.  When
 people benchmark Tokyo Tyrant, I hear numbers thrown around like
 22,000 writes/second sustained across 1M records:

http://blog.hunch.se/2009/02/28-tokyo-cabinet

 You might argue that the theoretical scalability of BigTable's
 distributed store is higher... but we're talking about two full orders
 of magnitude difference.  Will I ever near the 100-google-server
 equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
 running for 1 month is about $7,200.  Actual CPU speed is at least
 twice the measured rate, so a single Tokyo Tyrant is theoretically
 equivalent to almost $15,000/month of appengine hosting.  Ouch.

 Maybe this isn't an apples to apples comparison.  Sure, there aren't
 extra indexes on those Tyrant entities... but to be honest, few of my
 entities have extra indexes.  What other factors could change this
 analysis?

 Thoughts?

 BTW Tim, you may very well have quite a few indexes on your entities.
 In JDO, nearly all single fields are indexed by default.  You must
 explicitly add an annotation to your fields to make them unindexed.
 With Objectify, you can declare your entity as