[appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Guillermo Schwarz
Ikai,

Maybe you are right. Maybe not. I'm not an expert in datastore
internals, but here is my point of view.

This paper claims that Berkeley DB Java edition can insert about
15,000 records per second.

http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf

The graphic is on page 22. The main reason they claim to be able to do
that is that they don't need to actually sync the write to disk, they
can queue the write, update in-memory data and write a log file.
Writing the log file is for transactional purposes and it is the only
write really needed.That is pretty fast.

Cheers,
Guillermo.

On 24 feb, 16:51, "Ikai L (Google)"  wrote:
> I also remember hearing (and this is not verified so don't quote me on this
> or come after me if I'm wrong) from a friend of mine running KV stores in
> production that there were issues with certain distributed key/value stores
> that actually managed to slow down as a function of the number of objects in
> the store - and Tokyo Tyrant was on his list. A key property of scalable
> stores is that the opposite of this is true.
>
> 12,000 synchronous, serialized writes in a single sub-second request is
> pretty serious. I am not aware of a single website in the world that does
> this.
>
> On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer wrote:
>
>
>
> > I think this is actually an interesting question, and brings up a
> > discussion worth having:
>
> > Is datastore performance reasonable?
>
> > I don't want to make this a discussion of reliability, which is a
> > separate issue.  It just seems to me that the datastore is actually
> > kinda pokey, taking seconds to write a few hundred entities.  When
> > people benchmark Tokyo Tyrant, I hear numbers thrown around like
> > 22,000 writes/second sustained across 1M records:
>
> >http://blog.hunch.se/2009/02/28-tokyo-cabinet
>
> > You might argue that the theoretical scalability of BigTable's
> > distributed store is higher... but we're talking about two full orders
> > of magnitude difference.  Will I ever near the 100-google-server
> > equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
> > running for 1 month is about $7,200.  Actual CPU speed is at least
> > twice the measured rate, so a single Tokyo Tyrant is theoretically
> > equivalent to almost $15,000/month of appengine hosting.  Ouch.
>
> > Maybe this isn't an apples to apples comparison.  Sure, there aren't
> > extra indexes on those Tyrant entities... but to be honest, few of my
> > entities have extra indexes.  What other factors could change this
> > analysis?
>
> > Thoughts?
>
> > BTW Tim, you may very well have quite a few indexes on your entities.
> > In JDO, nearly all single fields are indexed by default.  You must
> > explicitly add an annotation to your fields to make them unindexed.
> > With Objectify, you can declare your entity as @Indexed or @Unindexed
> > and then use the same annotation on individual fields to override the
> > default.
>
> > Jeff
>
> > On Wed, Feb 24, 2010 at 12:43 AM, Tim Cooper  wrote:
> > > I have been trying to write 12,000 objects in a single page request.
> > > These objects are all very small and the total amount of memory is not
> > > large.  There is no index on these objects - the only GQL queries I
> > > make on them are based on the primary key.
>
> > > Ikai has said:  "That is - if you have to delete or create 150
> > > persistent, indexed objects, you may want to rethink what problems you
> > > are trying to solve."
>
> > > So I have been thinking about the problems I'm trying to solve,
> > > including looking at the BuddyPoke blog and reading the GAE
> > > documentation.  I'm trying to populate the database with entries
> > > relating to high school timetables.
>
> > > * I could do the writes asynchronously, but that looks like a lot of
> > > additional effort. On my C++ app, writing the same information to my
> > > laptop drive, this happens in under a second, because the amount of
> > > data is actually quite small, but it times out on GAE.
> > > * I am using pm.makePersistentAll(), but this doesn't help.
> > > * There is no index on the objects - I access them only through the
> > > primary key.  (I'm pretty sure there's no index - but how can I
> > > confirm this via the development server dashboard?)
> > > * The objects constitute 12,000 entity groups.  I could merge them
> > > into fewer entity groups, but there's no natural groupings I could
> > > use, so it could get quite complex to introduce a contrived grouping,
> > > and also this would complicate the multi-user updating of the objects.
> > >  The AppEngine team seem to generally recommend using more entity
> > > groups, but it's difficult to integrate that advice with the contrary
> > > advice to use fewer entity groups for acceptable performance.
> > > * I'd be happy if the GAE database was < 10 times slower than a
> > > non-cloud RDBMS, but the way I'm using it, it's currently not.
>
> > > Does anyone have any advice?
>
> > >

[appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Guillermo Schwarz
I think we can safely assume that the programmer was trying to speed
up things a little by writing 12 thousand objects in a single
operation.

Now if that gets to be faster or slower than writing each object
separately, it is a matter of the internal implementation of the data
store. I prefer to do no hacks, but OTOH it is better sometimes to be
clear bout what you want (API wise).

The point here is that the programmer wants to insert 15 thousand
objects in a second, you seem to imply that is possible.
"While it's an interesting thought exercise to see if BigTable can do
it through App Engine's interface (hint: it can, globally, easily)".

I rest my case ;-)

Do we need to do anything to test that? Is there anything we could do
to help?

Cheers,
Guillermo.

On 24 feb, 18:06, "Ikai L (Google)"  wrote:
> Simple key-only writes can definitely do it, but there's a few places where
> you can introduce overhead:
>
> - serialization
> - network I/O
> - indexes
>
> My point wasn't necessarily that it wasn't possible. makePersistentAll does
> use a batch write, and there are definitely sites that can do 12,000+ writes
> a second (and well above that), but I don't know of any that will attempt to
> do that in a single request. While it's an interesting thought exercise to
> see if BigTable can do it through App Engine's interface (hint: it can,
> globally, easily), I can't think of a single use case for a site to need to
> do this all the time and with the sub-second requirement. I think it's
> reasonable to ask why this design exists and why the requirements exist and
> rethink one or the other.
>
> On Wed, Feb 24, 2010 at 12:35 PM, Guillermo Schwarz <
>
>
>
> guillermo.schw...@gmail.com> wrote:
> > Ikai,
>
> > Maybe you are right. Maybe not. I'm not an expert in datastore
> > internals, but here is my point of view.
>
> > This paper claims that Berkeley DB Java edition can insert about
> > 15,000 records per second.
>
> >http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf
>
> > The graphic is on page 22. The main reason they claim to be able to do
> > that is that they don't need to actually sync the write to disk, they
> > can queue the write, update in-memory data and write a log file.
> > Writing the log file is for transactional purposes and it is the only
> > write really needed.That is pretty fast.
>
> > Cheers,
> > Guillermo.
>
> > On 24 feb, 16:51, "Ikai L (Google)"  wrote:
> > > I also remember hearing (and this is not verified so don't quote me on
> > this
> > > or come after me if I'm wrong) from a friend of mine running KV stores in
> > > production that there were issues with certain distributed key/value
> > stores
> > > that actually managed to slow down as a function of the number of objects
> > in
> > > the store - and Tokyo Tyrant was on his list. A key property of scalable
> > > stores is that the opposite of this is true.
>
> > > 12,000 synchronous, serialized writes in a single sub-second request is
> > > pretty serious. I am not aware of a single website in the world that does
> > > this.
>
> > > On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer  > >wrote:
>
> > > > I think this is actually an interesting question, and brings up a
> > > > discussion worth having:
>
> > > > Is datastore performance reasonable?
>
> > > > I don't want to make this a discussion of reliability, which is a
> > > > separate issue.  It just seems to me that the datastore is actually
> > > > kinda pokey, taking seconds to write a few hundred entities.  When
> > > > people benchmark Tokyo Tyrant, I hear numbers thrown around like
> > > > 22,000 writes/second sustained across 1M records:
>
> > > >http://blog.hunch.se/2009/02/28-tokyo-cabinet
>
> > > > You might argue that the theoretical scalability of BigTable's
> > > > distributed store is higher... but we're talking about two full orders
> > > > of magnitude difference.  Will I ever near the 100-google-server
> > > > equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
> > > > running for 1 month is about $7,200.  Actual CPU speed is at least
> > > > twice the measured rate, so a single Tokyo Tyrant is theoretically
> > > > equivalent to almost $15,000/month of appengine hosting.  Ouch.
>
> > > > Maybe this isn't an apples to apples comparison.  Sure, there aren't
> > > > extra indexes on those Tyrant entities... but to be honest, few of my
> > > > entities have extra indexes.  What other factors could change this
> > > > analysis?
>
> > > > Thoughts?
>
> > > > BTW Tim, you may very well have quite a few indexes on your entities.
> > > > In JDO, nearly all single fields are indexed by default.  You must
> > > > explicitly add an annotation to your fields to make them unindexed.
> > > > With Objectify, you can declare your entity as @Indexed or @Unindexed
> > > > and then use the same annotation on individual fields to override the
> > > > default.
>
> > > > Jeff
>
> > > > On Wed, Feb 24, 2010 at 12:43 AM, Tim Cooper  wrote:
> > > > >

[appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Larry Cable
my experience with a relatively simple application via JDO
makePersistentAll() was that I got
DataStore Operation Timeout exceptions with batch sizes of approx
200-300 objects ...

On Feb 24, 1:48 pm, Guillermo Schwarz 
wrote:
> I think we can safely assume that the programmer was trying to speed
> up things a little by writing 12 thousand objects in a single
> operation.
>
> Now if that gets to be faster or slower than writing each object
> separately, it is a matter of the internal implementation of the data
> store. I prefer to do no hacks, but OTOH it is better sometimes to be
> clear bout what you want (API wise).
>
> The point here is that the programmer wants to insert 15 thousand
> objects in a second, you seem to imply that is possible.
> "While it's an interesting thought exercise to see if BigTable can do
> it through App Engine's interface (hint: it can, globally, easily)".
>
> I rest my case ;-)
>
> Do we need to do anything to test that? Is there anything we could do
> to help?
>
> Cheers,
> Guillermo.
>
> On 24 feb, 18:06, "Ikai L (Google)"  wrote:
>
>
>
> > Simple key-only writes can definitely do it, but there's a few places where
> > you can introduce overhead:
>
> > - serialization
> > - network I/O
> > - indexes
>
> > My point wasn't necessarily that it wasn't possible. makePersistentAll does
> > use a batch write, and there are definitely sites that can do 12,000+ writes
> > a second (and well above that), but I don't know of any that will attempt to
> > do that in a single request. While it's an interesting thought exercise to
> > see if BigTable can do it through App Engine's interface (hint: it can,
> > globally, easily), I can't think of a single use case for a site to need to
> > do this all the time and with the sub-second requirement. I think it's
> > reasonable to ask why this design exists and why the requirements exist and
> > rethink one or the other.
>
> > On Wed, Feb 24, 2010 at 12:35 PM, Guillermo Schwarz <
>
> > guillermo.schw...@gmail.com> wrote:
> > > Ikai,
>
> > > Maybe you are right. Maybe not. I'm not an expert in datastore
> > > internals, but here is my point of view.
>
> > > This paper claims that Berkeley DB Java edition can insert about
> > > 15,000 records per second.
>
> > >http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf
>
> > > The graphic is on page 22. The main reason they claim to be able to do
> > > that is that they don't need to actually sync the write to disk, they
> > > can queue the write, update in-memory data and write a log file.
> > > Writing the log file is for transactional purposes and it is the only
> > > write really needed.That is pretty fast.
>
> > > Cheers,
> > > Guillermo.
>
> > > On 24 feb, 16:51, "Ikai L (Google)"  wrote:
> > > > I also remember hearing (and this is not verified so don't quote me on
> > > this
> > > > or come after me if I'm wrong) from a friend of mine running KV stores 
> > > > in
> > > > production that there were issues with certain distributed key/value
> > > stores
> > > > that actually managed to slow down as a function of the number of 
> > > > objects
> > > in
> > > > the store - and Tokyo Tyrant was on his list. A key property of scalable
> > > > stores is that the opposite of this is true.
>
> > > > 12,000 synchronous, serialized writes in a single sub-second request is
> > > > pretty serious. I am not aware of a single website in the world that 
> > > > does
> > > > this.
>
> > > > On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer  > > >wrote:
>
> > > > > I think this is actually an interesting question, and brings up a
> > > > > discussion worth having:
>
> > > > > Is datastore performance reasonable?
>
> > > > > I don't want to make this a discussion of reliability, which is a
> > > > > separate issue.  It just seems to me that the datastore is actually
> > > > > kinda pokey, taking seconds to write a few hundred entities.  When
> > > > > people benchmark Tokyo Tyrant, I hear numbers thrown around like
> > > > > 22,000 writes/second sustained across 1M records:
>
> > > > >http://blog.hunch.se/2009/02/28-tokyo-cabinet
>
> > > > > You might argue that the theoretical scalability of BigTable's
> > > > > distributed store is higher... but we're talking about two full orders
> > > > > of magnitude difference.  Will I ever near the 100-google-server
> > > > > equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
> > > > > running for 1 month is about $7,200.  Actual CPU speed is at least
> > > > > twice the measured rate, so a single Tokyo Tyrant is theoretically
> > > > > equivalent to almost $15,000/month of appengine hosting.  Ouch.
>
> > > > > Maybe this isn't an apples to apples comparison.  Sure, there aren't
> > > > > extra indexes on those Tyrant entities... but to be honest, few of my
> > > > > entities have extra indexes.  What other factors could change this
> > > > > analysis?
>
> > > > > Thoughts?
>
> > > > > BTW Tim, you may very well have quite a few indexes o

[appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Guillermo Schwarz
I think there is a way to grab big chunks of oprations, put them in a
queue to be done asynchronously and that would be it.

My take would be that using proxies it would be easy to queue any long
operation transparently. I've done that with EJBs in the past, I don't
see the reason a QueingProxy couldn't be written.

Cheers,
Guillermo.

On 25 feb, 17:02, Jeff Schnitzer  wrote:
> I don't think the original poster had a requirement for synchronous
> writes; he just didn't want to do the writes asynchronously because it
> involved a lot more code.
>
> I'm also perfectly fine with asynchronous writes and a very lax
> interpretation of consistency.  I don't even mind writing extra code.
> The thing I worry about is the feasibility of a heavy write load and
> the total cost of it.
>
> Unfortunately I really can't describe in detail what I want to do (I
> normally laugh at this kind of secrecy, but in this case it's
> warranted).  For the game mechanic I'm thinking about, the
> average-case scenario is not very far from the worst-case scenario.
> Just a little detail:
>
>  * There is no requirement that all of a user's friends must be
> playing the game or even have installed the app to receive points.
> Welcome to the world of social gaming, you can play without even
> without knowing it!
>  * There are *lots* of FB users that have > 1k friends.  Probably
> millions.  More active FB users are likely to have more friends... and
> more likely to use my app.
>  * Points can be assigned to multiple layers, so the # of updates is
> (layers * friends).
>  * Tens of thousands of people play this "game". It could become
> hundreds of thousands very soon.  If I'm lucky, millions.
>
> I would love to implement this game mechanic, but I just can't.
> Asynchronous or not, it's *way* too expensive on appengine.  When it
> comes time to implement this feature (and it's going to come, I can
> see the winds blowing), I'm probably going to have to move my scoring
> system out of appengine.  Which is a bit ironic, because one of the
> main advantages of appengine is scalability.
>
> I would *love* to see some sort of super-super-lax and
> super-super-cheap consistency option for BigTable.  Or even an
> alternative key/value datastore that simply works like a persistent
> version of memcached.  Something that would let me sustain 10k
> writes/sec without bankrupting me.
>
> Jeff
>
> On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google)  wrote:
>
> > Jeff, point taken, but the original poster has been asking for three
> > different requirements:
> > - requirement to do all writes synchronously
> > - sub-some-couple-hundred-millisecond writes
> > - 12k entities being written
>
> > This just won't scale well if it's common. Messaging users can be done
> > asynchronously, as can the portion crediting friends. I understand the
> > argument that you may want to do this during the lifecycle of the request so
> > the original user gets some kind of feedback backed by a strongly consistent
> > datastore, but this just isn't done. Feedback is usually faked out
> > optimistically, assuming that the writes will all be successful with some
> > cached layer being the only part of the stack being updated inside the
> > request. Thinking of worse case scenarios is a good thought exercise, but
> > it's also a bit too optimistic to design a product assuming all of a Users'
> > friends will play the game and engineer to meet that unrealistic
> > expectation. What are the standard and slightly non-standard use cases? I'd
> > probably look at a solution where I can store the data somewhere associated
> > with the original user for any users not already in the datastore, then
> > retrieve and generate a score for any of that User's friends on first
> > access. Facebook's developer admin tool has some pretty good statistics such
> > as bounce rate, block rate and invitation accept rate that can be used to
> > tune this design.
> > Slightly off topic, but we've been asked before if it was possible to
> > provide different levels of datastore consistency. In some cases I can see
> > the tradeoffs making sense.
>
> > On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer  wrote:
>
> >> On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) 
> >> wrote:
> >> > My point wasn't necessarily that it wasn't possible. makePersistentAll
> >> > does
> >> > use a batch write, and there are definitely sites that can do 12,000+
> >> > writes
> >> > a second (and well above that), but I don't know of any that will
> >> > attempt to
> >> > do that in a single request. While it's an interesting thought exercise
> >> > to
> >> > see if BigTable can do it through App Engine's interface (hint: it can,
> >> > globally, easily), I can't think of a single use case for a site to need
> >> > to
> >> > do this all the time and with the sub-second requirement. I think it's
> >> > reasonable to ask why this design exists and why the requirements exist
> >> > and
> >> > rethink one or the other.
>
> 

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Ikai L (Google)
Simple key-only writes can definitely do it, but there's a few places where
you can introduce overhead:

- serialization
- network I/O
- indexes

My point wasn't necessarily that it wasn't possible. makePersistentAll does
use a batch write, and there are definitely sites that can do 12,000+ writes
a second (and well above that), but I don't know of any that will attempt to
do that in a single request. While it's an interesting thought exercise to
see if BigTable can do it through App Engine's interface (hint: it can,
globally, easily), I can't think of a single use case for a site to need to
do this all the time and with the sub-second requirement. I think it's
reasonable to ask why this design exists and why the requirements exist and
rethink one or the other.

On Wed, Feb 24, 2010 at 12:35 PM, Guillermo Schwarz <
guillermo.schw...@gmail.com> wrote:

> Ikai,
>
> Maybe you are right. Maybe not. I'm not an expert in datastore
> internals, but here is my point of view.
>
> This paper claims that Berkeley DB Java edition can insert about
> 15,000 records per second.
>
> http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf
>
> The graphic is on page 22. The main reason they claim to be able to do
> that is that they don't need to actually sync the write to disk, they
> can queue the write, update in-memory data and write a log file.
> Writing the log file is for transactional purposes and it is the only
> write really needed.That is pretty fast.
>
> Cheers,
> Guillermo.
>
> On 24 feb, 16:51, "Ikai L (Google)"  wrote:
> > I also remember hearing (and this is not verified so don't quote me on
> this
> > or come after me if I'm wrong) from a friend of mine running KV stores in
> > production that there were issues with certain distributed key/value
> stores
> > that actually managed to slow down as a function of the number of objects
> in
> > the store - and Tokyo Tyrant was on his list. A key property of scalable
> > stores is that the opposite of this is true.
> >
> > 12,000 synchronous, serialized writes in a single sub-second request is
> > pretty serious. I am not aware of a single website in the world that does
> > this.
> >
> > On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer  >wrote:
> >
> >
> >
> > > I think this is actually an interesting question, and brings up a
> > > discussion worth having:
> >
> > > Is datastore performance reasonable?
> >
> > > I don't want to make this a discussion of reliability, which is a
> > > separate issue.  It just seems to me that the datastore is actually
> > > kinda pokey, taking seconds to write a few hundred entities.  When
> > > people benchmark Tokyo Tyrant, I hear numbers thrown around like
> > > 22,000 writes/second sustained across 1M records:
> >
> > >http://blog.hunch.se/2009/02/28-tokyo-cabinet
> >
> > > You might argue that the theoretical scalability of BigTable's
> > > distributed store is higher... but we're talking about two full orders
> > > of magnitude difference.  Will I ever near the 100-google-server
> > > equivalent load?  Could I pay for it if I did?  100 CPUs (measured)
> > > running for 1 month is about $7,200.  Actual CPU speed is at least
> > > twice the measured rate, so a single Tokyo Tyrant is theoretically
> > > equivalent to almost $15,000/month of appengine hosting.  Ouch.
> >
> > > Maybe this isn't an apples to apples comparison.  Sure, there aren't
> > > extra indexes on those Tyrant entities... but to be honest, few of my
> > > entities have extra indexes.  What other factors could change this
> > > analysis?
> >
> > > Thoughts?
> >
> > > BTW Tim, you may very well have quite a few indexes on your entities.
> > > In JDO, nearly all single fields are indexed by default.  You must
> > > explicitly add an annotation to your fields to make them unindexed.
> > > With Objectify, you can declare your entity as @Indexed or @Unindexed
> > > and then use the same annotation on individual fields to override the
> > > default.
> >
> > > Jeff
> >
> > > On Wed, Feb 24, 2010 at 12:43 AM, Tim Cooper  wrote:
> > > > I have been trying to write 12,000 objects in a single page request.
> > > > These objects are all very small and the total amount of memory is
> not
> > > > large.  There is no index on these objects - the only GQL queries I
> > > > make on them are based on the primary key.
> >
> > > > Ikai has said:  "That is - if you have to delete or create 150
> > > > persistent, indexed objects, you may want to rethink what problems
> you
> > > > are trying to solve."
> >
> > > > So I have been thinking about the problems I'm trying to solve,
> > > > including looking at the BuddyPoke blog and reading the GAE
> > > > documentation.  I'm trying to populate the database with entries
> > > > relating to high school timetables.
> >
> > > > * I could do the writes asynchronously, but that looks like a lot of
> > > > additional effort. On my C++ app, writing the same information to my
> > > > laptop drive, this happens in under a second,

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-24 Thread Jeff Schnitzer
On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google)  wrote:
> My point wasn't necessarily that it wasn't possible. makePersistentAll does
> use a batch write, and there are definitely sites that can do 12,000+ writes
> a second (and well above that), but I don't know of any that will attempt to
> do that in a single request. While it's an interesting thought exercise to
> see if BigTable can do it through App Engine's interface (hint: it can,
> globally, easily), I can't think of a single use case for a site to need to
> do this all the time and with the sub-second requirement. I think it's
> reasonable to ask why this design exists and why the requirements exist and
> rethink one or the other.

It does seem to be a pretty extreme case, but it's not all that far
fetched.  It's possible for a Facebook user to have 5,000 friends.
Perhaps a user wants to message all 5k of them.

I could actually use this right ability now.  I would like to add a
game mechanic which, when you score some points, you also credit a
portion of that to all of a user's friends.  Worst case scenario is a
5,000 element read followed by a 5,000 element write.  I'm probably
going to skip this mechanic for now because I can't afford it - even
with the average 200 or so friends.  If I want it badly enough, I may
ultimately need to move my scoring system offsite.

Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Ikai L (Google)
Jeff, point taken, but the original poster has been asking for three
different requirements:

- requirement to do all writes synchronously
- sub-some-couple-hundred-millisecond writes
- 12k entities being written

This just won't scale well if it's common. Messaging users can be done
asynchronously, as can the portion crediting friends. I understand the
argument that you may want to do this during the lifecycle of the request so
the original user gets some kind of feedback backed by a strongly consistent
datastore, but this just isn't done. Feedback is usually faked out
optimistically, assuming that the writes will all be successful with some
cached layer being the only part of the stack being updated inside the
request. Thinking of worse case scenarios is a good thought exercise, but
it's also a bit too optimistic to design a product assuming all of a Users'
friends will play the game and engineer to meet that unrealistic
expectation. What are the standard and slightly non-standard use cases? I'd
probably look at a solution where I can store the data somewhere associated
with the original user for any users not already in the datastore, then
retrieve and generate a score for any of that User's friends on first
access. Facebook's developer admin tool has some pretty good statistics such
as bounce rate, block rate and invitation accept rate that can be used to
tune this design.

Slightly off topic, but we've been asked before if it was possible to
provide different levels of datastore consistency. In some cases I can see
the tradeoffs making sense.

On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer  wrote:

> On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) 
> wrote:
> > My point wasn't necessarily that it wasn't possible. makePersistentAll
> does
> > use a batch write, and there are definitely sites that can do 12,000+
> writes
> > a second (and well above that), but I don't know of any that will attempt
> to
> > do that in a single request. While it's an interesting thought exercise
> to
> > see if BigTable can do it through App Engine's interface (hint: it can,
> > globally, easily), I can't think of a single use case for a site to need
> to
> > do this all the time and with the sub-second requirement. I think it's
> > reasonable to ask why this design exists and why the requirements exist
> and
> > rethink one or the other.
>
> It does seem to be a pretty extreme case, but it's not all that far
> fetched.  It's possible for a Facebook user to have 5,000 friends.
> Perhaps a user wants to message all 5k of them.
>
> I could actually use this right ability now.  I would like to add a
> game mechanic which, when you score some points, you also credit a
> portion of that to all of a user's friends.  Worst case scenario is a
> 5,000 element read followed by a 5,000 element write.  I'm probably
> going to skip this mechanic for now because I can't afford it - even
> with the average 200 or so friends.  If I want it badly enough, I may
> ultimately need to move my scoring system offsite.
>
> Jeff
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine for Java" group.
> To post to this group, send email to
> google-appengine-j...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine-java+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine-java?hl=en.
>
>


-- 
Ikai Lan
Developer Programs Engineer, Google App Engine
http://googleappengine.blogspot.com | http://twitter.com/app_engine

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Jeff Schnitzer
I don't think the original poster had a requirement for synchronous
writes; he just didn't want to do the writes asynchronously because it
involved a lot more code.

I'm also perfectly fine with asynchronous writes and a very lax
interpretation of consistency.  I don't even mind writing extra code.
The thing I worry about is the feasibility of a heavy write load and
the total cost of it.

Unfortunately I really can't describe in detail what I want to do (I
normally laugh at this kind of secrecy, but in this case it's
warranted).  For the game mechanic I'm thinking about, the
average-case scenario is not very far from the worst-case scenario.
Just a little detail:

 * There is no requirement that all of a user's friends must be
playing the game or even have installed the app to receive points.
Welcome to the world of social gaming, you can play without even
without knowing it!
 * There are *lots* of FB users that have > 1k friends.  Probably
millions.  More active FB users are likely to have more friends... and
more likely to use my app.
 * Points can be assigned to multiple layers, so the # of updates is
(layers * friends).
 * Tens of thousands of people play this "game". It could become
hundreds of thousands very soon.  If I'm lucky, millions.

I would love to implement this game mechanic, but I just can't.
Asynchronous or not, it's *way* too expensive on appengine.  When it
comes time to implement this feature (and it's going to come, I can
see the winds blowing), I'm probably going to have to move my scoring
system out of appengine.  Which is a bit ironic, because one of the
main advantages of appengine is scalability.

I would *love* to see some sort of super-super-lax and
super-super-cheap consistency option for BigTable.  Or even an
alternative key/value datastore that simply works like a persistent
version of memcached.  Something that would let me sustain 10k
writes/sec without bankrupting me.

Jeff

On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google)  wrote:
> Jeff, point taken, but the original poster has been asking for three
> different requirements:
> - requirement to do all writes synchronously
> - sub-some-couple-hundred-millisecond writes
> - 12k entities being written
>
> This just won't scale well if it's common. Messaging users can be done
> asynchronously, as can the portion crediting friends. I understand the
> argument that you may want to do this during the lifecycle of the request so
> the original user gets some kind of feedback backed by a strongly consistent
> datastore, but this just isn't done. Feedback is usually faked out
> optimistically, assuming that the writes will all be successful with some
> cached layer being the only part of the stack being updated inside the
> request. Thinking of worse case scenarios is a good thought exercise, but
> it's also a bit too optimistic to design a product assuming all of a Users'
> friends will play the game and engineer to meet that unrealistic
> expectation. What are the standard and slightly non-standard use cases? I'd
> probably look at a solution where I can store the data somewhere associated
> with the original user for any users not already in the datastore, then
> retrieve and generate a score for any of that User's friends on first
> access. Facebook's developer admin tool has some pretty good statistics such
> as bounce rate, block rate and invitation accept rate that can be used to
> tune this design.
> Slightly off topic, but we've been asked before if it was possible to
> provide different levels of datastore consistency. In some cases I can see
> the tradeoffs making sense.
>
> On Wed, Feb 24, 2010 at 5:52 PM, Jeff Schnitzer  wrote:
>>
>> On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) 
>> wrote:
>> > My point wasn't necessarily that it wasn't possible. makePersistentAll
>> > does
>> > use a batch write, and there are definitely sites that can do 12,000+
>> > writes
>> > a second (and well above that), but I don't know of any that will
>> > attempt to
>> > do that in a single request. While it's an interesting thought exercise
>> > to
>> > see if BigTable can do it through App Engine's interface (hint: it can,
>> > globally, easily), I can't think of a single use case for a site to need
>> > to
>> > do this all the time and with the sub-second requirement. I think it's
>> > reasonable to ask why this design exists and why the requirements exist
>> > and
>> > rethink one or the other.
>>
>> It does seem to be a pretty extreme case, but it's not all that far
>> fetched.  It's possible for a Facebook user to have 5,000 friends.
>> Perhaps a user wants to message all 5k of them.
>>
>> I could actually use this right ability now.  I would like to add a
>> game mechanic which, when you score some points, you also credit a
>> portion of that to all of a user's friends.  Worst case scenario is a
>> 5,000 element read followed by a 5,000 element write.  I'm probably
>> going to skip this mechanic for now because I can't afford it -

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread margus pala
Hi

I wrote importing geoip database in JSF. It has like 100k entries. Besides
appengine and datastore performance is awful and importing took around 1,5h
of total CPU its fairly easy to use TaskQueue. If there is above average
processing to be done then i suggest separating task into smaller batches
and process them asynchronously.

Margus

On Thu, Feb 25, 2010 at 3:52 AM, Jeff Schnitzer  wrote:

> On Wed, Feb 24, 2010 at 1:06 PM, Ikai L (Google) 
> wrote:
> > My point wasn't necessarily that it wasn't possible. makePersistentAll
> does
> > use a batch write, and there are definitely sites that can do 12,000+
> writes
> > a second (and well above that), but I don't know of any that will attempt
> to
> > do that in a single request. While it's an interesting thought exercise
> to
> > see if BigTable can do it through App Engine's interface (hint: it can,
> > globally, easily), I can't think of a single use case for a site to need
> to
> > do this all the time and with the sub-second requirement. I think it's
> > reasonable to ask why this design exists and why the requirements exist
> and
> > rethink one or the other.
>
> It does seem to be a pretty extreme case, but it's not all that far
> fetched.  It's possible for a Facebook user to have 5,000 friends.
> Perhaps a user wants to message all 5k of them.
>
> I could actually use this right ability now.  I would like to add a
> game mechanic which, when you score some points, you also credit a
> portion of that to all of a user's friends.  Worst case scenario is a
> 5,000 element read followed by a 5,000 element write.  I'm probably
> going to skip this mechanic for now because I can't afford it - even
> with the average 200 or so friends.  If I want it badly enough, I may
> ultimately need to move my scoring system offsite.
>
> Jeff
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine for Java" group.
> To post to this group, send email to
> google-appengine-j...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine-java+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine-java?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.



Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Scott Hernandez
Guillermo,

Taskqueue items can only be 10K
(http://code.google.com/appengine/docs/java/taskqueue/overview.html#Quotas_and_Limits).
The basic idea is that if you have more data than that you put it into
an entity (in the data-store) and have the task pull it out and
process it. It might be that you can persist those 12K entities in a
lot of large entities (that are still under 1mByte each), but that is
a lot of work for something that will still probably fail. I guess it
all depends where the cost is on the puts (indexing, raw writes by
bytes, number of items).

And if your comeback is "memcache", well, I won't even start
discussing using a non-persistent, volatile, store like that for
temporary storage while you write them to the datastore... in batches
using the taskqueue/cron/etc.

Really, there needs to be something that can handle the write volume.
On Thu, Feb 25, 2010 at 12:08 PM, Guillermo Schwarz
 wrote:
> I think there is a way to grab big chunks of oprations, put them in a
> queue to be done asynchronously and that would be it.
>
> My take would be that using proxies it would be easy to queue any long
> operation transparently. I've done that with EJBs in the past, I don't
> see the reason a QueingProxy couldn't be written.
>
> Cheers,
> Guillermo.
>
> On 25 feb, 17:02, Jeff Schnitzer  wrote:
>> I don't think the original poster had a requirement for synchronous
>> writes; he just didn't want to do the writes asynchronously because it
>> involved a lot more code.
>>
>> I'm also perfectly fine with asynchronous writes and a very lax
>> interpretation of consistency.  I don't even mind writing extra code.
>> The thing I worry about is the feasibility of a heavy write load and
>> the total cost of it.
>>
>> Unfortunately I really can't describe in detail what I want to do (I
>> normally laugh at this kind of secrecy, but in this case it's
>> warranted).  For the game mechanic I'm thinking about, the
>> average-case scenario is not very far from the worst-case scenario.
>> Just a little detail:
>>
>>  * There is no requirement that all of a user's friends must be
>> playing the game or even have installed the app to receive points.
>> Welcome to the world of social gaming, you can play without even
>> without knowing it!
>>  * There are *lots* of FB users that have > 1k friends.  Probably
>> millions.  More active FB users are likely to have more friends... and
>> more likely to use my app.
>>  * Points can be assigned to multiple layers, so the # of updates is
>> (layers * friends).
>>  * Tens of thousands of people play this "game". It could become
>> hundreds of thousands very soon.  If I'm lucky, millions.
>>
>> I would love to implement this game mechanic, but I just can't.
>> Asynchronous or not, it's *way* too expensive on appengine.  When it
>> comes time to implement this feature (and it's going to come, I can
>> see the winds blowing), I'm probably going to have to move my scoring
>> system out of appengine.  Which is a bit ironic, because one of the
>> main advantages of appengine is scalability.
>>
>> I would *love* to see some sort of super-super-lax and
>> super-super-cheap consistency option for BigTable.  Or even an
>> alternative key/value datastore that simply works like a persistent
>> version of memcached.  Something that would let me sustain 10k
>> writes/sec without bankrupting me.
>>
>> Jeff
>>
>> On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google)  wrote:
>>
>> > Jeff, point taken, but the original poster has been asking for three
>> > different requirements:
>> > - requirement to do all writes synchronously
>> > - sub-some-couple-hundred-millisecond writes
>> > - 12k entities being written
>>
>> > This just won't scale well if it's common. Messaging users can be done
>> > asynchronously, as can the portion crediting friends. I understand the
>> > argument that you may want to do this during the lifecycle of the request 
>> > so
>> > the original user gets some kind of feedback backed by a strongly 
>> > consistent
>> > datastore, but this just isn't done. Feedback is usually faked out
>> > optimistically, assuming that the writes will all be successful with some
>> > cached layer being the only part of the stack being updated inside the
>> > request. Thinking of worse case scenarios is a good thought exercise, but
>> > it's also a bit too optimistic to design a product assuming all of a Users'
>> > friends will play the game and engineer to meet that unrealistic
>> > expectation. What are the standard and slightly non-standard use cases? I'd
>> > probably look at a solution where I can store the data somewhere associated
>> > with the original user for any users not already in the datastore, then
>> > retrieve and generate a score for any of that User's friends on first
>> > access. Facebook's developer admin tool has some pretty good statistics 
>> > such
>> > as bounce rate, block rate and invitation accept rate that can be used to
>> > tune this design.
>> > Slightly off top

Re: [appengine-java] Re: Can pm.makePersistentAll() help me write 12,000 objects?

2010-02-25 Thread Ikai L (Google)
We have an issue for an asynchronous write API for the datastore:

http://code.google.com/p/googleappengine/issues/detail?id=2817

This is something that can fit into that model.

On Thu, Feb 25, 2010 at 12:26 PM, Scott Hernandez
wrote:

> Guillermo,
>
> Taskqueue items can only be 10K
> (
> http://code.google.com/appengine/docs/java/taskqueue/overview.html#Quotas_and_Limits
> ).
> The basic idea is that if you have more data than that you put it into
> an entity (in the data-store) and have the task pull it out and
> process it. It might be that you can persist those 12K entities in a
> lot of large entities (that are still under 1mByte each), but that is
> a lot of work for something that will still probably fail. I guess it
> all depends where the cost is on the puts (indexing, raw writes by
> bytes, number of items).
>
> And if your comeback is "memcache", well, I won't even start
> discussing using a non-persistent, volatile, store like that for
> temporary storage while you write them to the datastore... in batches
> using the taskqueue/cron/etc.
>
> Really, there needs to be something that can handle the write volume.
> On Thu, Feb 25, 2010 at 12:08 PM, Guillermo Schwarz
>  wrote:
> > I think there is a way to grab big chunks of oprations, put them in a
> > queue to be done asynchronously and that would be it.
> >
> > My take would be that using proxies it would be easy to queue any long
> > operation transparently. I've done that with EJBs in the past, I don't
> > see the reason a QueingProxy couldn't be written.
> >
> > Cheers,
> > Guillermo.
> >
> > On 25 feb, 17:02, Jeff Schnitzer  wrote:
> >> I don't think the original poster had a requirement for synchronous
> >> writes; he just didn't want to do the writes asynchronously because it
> >> involved a lot more code.
> >>
> >> I'm also perfectly fine with asynchronous writes and a very lax
> >> interpretation of consistency.  I don't even mind writing extra code.
> >> The thing I worry about is the feasibility of a heavy write load and
> >> the total cost of it.
> >>
> >> Unfortunately I really can't describe in detail what I want to do (I
> >> normally laugh at this kind of secrecy, but in this case it's
> >> warranted).  For the game mechanic I'm thinking about, the
> >> average-case scenario is not very far from the worst-case scenario.
> >> Just a little detail:
> >>
> >>  * There is no requirement that all of a user's friends must be
> >> playing the game or even have installed the app to receive points.
> >> Welcome to the world of social gaming, you can play without even
> >> without knowing it!
> >>  * There are *lots* of FB users that have > 1k friends.  Probably
> >> millions.  More active FB users are likely to have more friends... and
> >> more likely to use my app.
> >>  * Points can be assigned to multiple layers, so the # of updates is
> >> (layers * friends).
> >>  * Tens of thousands of people play this "game". It could become
> >> hundreds of thousands very soon.  If I'm lucky, millions.
> >>
> >> I would love to implement this game mechanic, but I just can't.
> >> Asynchronous or not, it's *way* too expensive on appengine.  When it
> >> comes time to implement this feature (and it's going to come, I can
> >> see the winds blowing), I'm probably going to have to move my scoring
> >> system out of appengine.  Which is a bit ironic, because one of the
> >> main advantages of appengine is scalability.
> >>
> >> I would *love* to see some sort of super-super-lax and
> >> super-super-cheap consistency option for BigTable.  Or even an
> >> alternative key/value datastore that simply works like a persistent
> >> version of memcached.  Something that would let me sustain 10k
> >> writes/sec without bankrupting me.
> >>
> >> Jeff
> >>
> >> On Thu, Feb 25, 2010 at 11:16 AM, Ikai L (Google) 
> wrote:
> >>
> >> > Jeff, point taken, but the original poster has been asking for three
> >> > different requirements:
> >> > - requirement to do all writes synchronously
> >> > - sub-some-couple-hundred-millisecond writes
> >> > - 12k entities being written
> >>
> >> > This just won't scale well if it's common. Messaging users can be done
> >> > asynchronously, as can the portion crediting friends. I understand the
> >> > argument that you may want to do this during the lifecycle of the
> request so
> >> > the original user gets some kind of feedback backed by a strongly
> consistent
> >> > datastore, but this just isn't done. Feedback is usually faked out
> >> > optimistically, assuming that the writes will all be successful with
> some
> >> > cached layer being the only part of the stack being updated inside the
> >> > request. Thinking of worse case scenarios is a good thought exercise,
> but
> >> > it's also a bit too optimistic to design a product assuming all of a
> Users'
> >> > friends will play the game and engineer to meet that unrealistic
> >> > expectation. What are the standard and slightly non-standard use
> cases? I'd
> >> > probably l