Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-06 Thread Ikai Lan (Google)
Luke, thanks for the follow up! You're right that sometimes RPC
overhead can add up especially with something as fast as Memcache, so
batching things is definitely your friend. With the datastore, the RPC
overhead should be a much smaller percentage overall of operations, so
you see real benefit when you do, well, exactly what you did switching
to async operations and batching when possible.

This is probably my favorite post of the morning. Here's hoping more
developers see this and are inspired by your optimizations =).

--
Ikai Lan
Developer Programs Engineer, Google App Engine
Blogger: http://googleappengine.blogspot.com
Reddit: http://www.reddit.com/r/appengine
Twitter: http://twitter.com/app_engine



On Sat, Dec 4, 2010 at 10:55 PM, Luke lvale...@gmail.com wrote:
 i finished updating my server to use the AsyncDatastoreService.  i
 also cleaned up my memcache code to batch cache requests.  both of
 these changes allowed me to improve the request time by up to 4x for
 some requests.  from ~80ms to ~20ms.  now i can prefetch content for
 the user with little to no penalty to request latency.  in fact, much
 content will have no latency thanks to prefetching :)

 the server used to get and set cached objects in memcache for each
 command in a batch.  if i have 4 commands in a batch, that could be up
 to 8 memcache RPCs as well as the actual work for those commands.
 that was pretty wasteful.  so i updated my server to batch all gets
 into a getAll, and all puts into a putAll.  that made a big
 difference.  the length of each get and put are the same, but now i
 have no more than two memcache calls no matter how many commands are
 in a batch.  if everything hits the cache, then i don't even need to
 do a put...the entire request will finish in about 6ms.

 the server also used the synchronous datastore service. so all i/o was
 blocking.  now it's been updated to use AsyncDatastoreService.  the
 server can kick off all i/o for each command at the beginning of the
 request and gather up the results when they finish.

 my queries are still blocking...but that doesn't seem to be much of an
 impact for now.

 thanks to the app engine team for delivering this interface :)

 On Dec 3, 11:52 am, Jeff Schnitzer j...@infohazard.org wrote:
 Does it take so much time to process your results that it really
 matters they be done in the optimal order?

 All that polling code is complicated... unless you're shaving off a
 lot of real-world time, seems like it's better to just launch all
 batches and block on the first one.

 Jeff

 On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote:
  great, thanks for the insight max.

  i have a client that will batch together multiple requests into one
  RPC call to my app on GAE.  each of these individual requests may have
  one or more datastore accesses.  this may include some prefetch
  requests.

  so i want to build a mechanism that will interleave these requests
  taking advantage of theAsyncDatastoreServicefor minimum request
  latency.

  i've gone through my server-side stack and made it asynchronous by
  wrapping RPC returns in Future objects.  then i've created a
  FutureChain object that takes one or more Future objects as input, and
  will return one Future object.  i then have some code that will poll
  the ultimate Future objects until all of them have finished.

  it ends up being a simple multi-threaded emulation where each
  individual request in a batch gets a thread and each thread gives up
  control when it makes an Async request.

  now for the PreparedQuery, because my app knows how many items i want,
  i should be able to wrap it in a special Future object that will try
  to pull in that many items when it is polled...but the problem is, i
  don't know when the batch has come back, so every time i call next(),
  i risk blocking on I/O when i could be initiating another I/O
  asynchronously or processing the results of an async I/O.

  so until there is explicit knowledge of when the I/O for a batch has
  finished, i may be able to get away with reducing the poll-rate of
  queries

  i suppose i could just query for the keys, then i could use an
  explicit Async method to fetch the entities themselves.  if i query
  for keys, will they be split up in batches?  any way to know how many
  keys will be in one batch?

  On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
  wrote:
  Hi Luke,

  First the awesome news:
  As of 1.4.0, many queries are implicitly asynchronous.  When you call
  PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the
  query in the background and then immediately return.  This lets you do 
  work
  while the first batch of results is being fetched.  And, when the first
  batch has been consumed we immediately request the next batch.  If you're
  performing a significant amount of work with each Entity as you iterate 
  you
  will probably see a latency win as a result of this.

  Now the 

[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-04 Thread Luke
i finished updating my server to use the AsyncDatastoreService.  i
also cleaned up my memcache code to batch cache requests.  both of
these changes allowed me to improve the request time by up to 4x for
some requests.  from ~80ms to ~20ms.  now i can prefetch content for
the user with little to no penalty to request latency.  in fact, much
content will have no latency thanks to prefetching :)

the server used to get and set cached objects in memcache for each
command in a batch.  if i have 4 commands in a batch, that could be up
to 8 memcache RPCs as well as the actual work for those commands.
that was pretty wasteful.  so i updated my server to batch all gets
into a getAll, and all puts into a putAll.  that made a big
difference.  the length of each get and put are the same, but now i
have no more than two memcache calls no matter how many commands are
in a batch.  if everything hits the cache, then i don't even need to
do a put...the entire request will finish in about 6ms.

the server also used the synchronous datastore service. so all i/o was
blocking.  now it's been updated to use AsyncDatastoreService.  the
server can kick off all i/o for each command at the beginning of the
request and gather up the results when they finish.

my queries are still blocking...but that doesn't seem to be much of an
impact for now.

thanks to the app engine team for delivering this interface :)

On Dec 3, 11:52 am, Jeff Schnitzer j...@infohazard.org wrote:
 Does it take so much time to process your results that it really
 matters they be done in the optimal order?

 All that polling code is complicated... unless you're shaving off a
 lot of real-world time, seems like it's better to just launch all
 batches and block on the first one.

 Jeff

 On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote:
  great, thanks for the insight max.

  i have a client that will batch together multiple requests into one
  RPC call to my app on GAE.  each of these individual requests may have
  one or more datastore accesses.  this may include some prefetch
  requests.

  so i want to build a mechanism that will interleave these requests
  taking advantage of theAsyncDatastoreServicefor minimum request
  latency.

  i've gone through my server-side stack and made it asynchronous by
  wrapping RPC returns in Future objects.  then i've created a
  FutureChain object that takes one or more Future objects as input, and
  will return one Future object.  i then have some code that will poll
  the ultimate Future objects until all of them have finished.

  it ends up being a simple multi-threaded emulation where each
  individual request in a batch gets a thread and each thread gives up
  control when it makes an Async request.

  now for the PreparedQuery, because my app knows how many items i want,
  i should be able to wrap it in a special Future object that will try
  to pull in that many items when it is polled...but the problem is, i
  don't know when the batch has come back, so every time i call next(),
  i risk blocking on I/O when i could be initiating another I/O
  asynchronously or processing the results of an async I/O.

  so until there is explicit knowledge of when the I/O for a batch has
  finished, i may be able to get away with reducing the poll-rate of
  queries

  i suppose i could just query for the keys, then i could use an
  explicit Async method to fetch the entities themselves.  if i query
  for keys, will they be split up in batches?  any way to know how many
  keys will be in one batch?

  On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
  wrote:
  Hi Luke,

  First the awesome news:
  As of 1.4.0, many queries are implicitly asynchronous.  When you call
  PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the
  query in the background and then immediately return.  This lets you do work
  while the first batch of results is being fetched.  And, when the first
  batch has been consumed we immediately request the next batch.  If you're
  performing a significant amount of work with each Entity as you iterate you
  will probably see a latency win as a result of this.

  Now the less awesome news:
  We didn't get around to making the List returned by PreparedQuery.asList()
  work this same magic, but you can expect this in a future release.

  Some deeper thoughts:
  The underlying RPCs between your app and the datastore fetch results in
  batches.  We fetch an initial batch of results, and once that batch has 
  been
  consumed we fetch the next batch.  But, there's nothing in the API that 
  maps
  to these batches - it's either a List containing the entire result set or 
  an
  Iterable/Iterator that returns Entities one at a time.  An API that 
  provides
  async access to the individual results returned by an Iterable/Iterator
  (IteratorFutureEntity) doesn't really make sense since you don't know
  which call to hasNext() is going to require a new batch to be fetched, 

Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-03 Thread Jeff Schnitzer
Does it take so much time to process your results that it really
matters they be done in the optimal order?

All that polling code is complicated... unless you're shaving off a
lot of real-world time, seems like it's better to just launch all
batches and block on the first one.

Jeff

On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote:
 great, thanks for the insight max.

 i have a client that will batch together multiple requests into one
 RPC call to my app on GAE.  each of these individual requests may have
 one or more datastore accesses.  this may include some prefetch
 requests.

 so i want to build a mechanism that will interleave these requests
 taking advantage of the AsyncDatastoreService for minimum request
 latency.

 i've gone through my server-side stack and made it asynchronous by
 wrapping RPC returns in Future objects.  then i've created a
 FutureChain object that takes one or more Future objects as input, and
 will return one Future object.  i then have some code that will poll
 the ultimate Future objects until all of them have finished.

 it ends up being a simple multi-threaded emulation where each
 individual request in a batch gets a thread and each thread gives up
 control when it makes an Async request.

 now for the PreparedQuery, because my app knows how many items i want,
 i should be able to wrap it in a special Future object that will try
 to pull in that many items when it is polled...but the problem is, i
 don't know when the batch has come back, so every time i call next(),
 i risk blocking on I/O when i could be initiating another I/O
 asynchronously or processing the results of an async I/O.

 so until there is explicit knowledge of when the I/O for a batch has
 finished, i may be able to get away with reducing the poll-rate of
 queries

 i suppose i could just query for the keys, then i could use an
 explicit Async method to fetch the entities themselves.  if i query
 for keys, will they be split up in batches?  any way to know how many
 keys will be in one batch?


 On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
 wrote:
 Hi Luke,

 First the awesome news:
 As of 1.4.0, many queries are implicitly asynchronous.  When you call
 PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the
 query in the background and then immediately return.  This lets you do work
 while the first batch of results is being fetched.  And, when the first
 batch has been consumed we immediately request the next batch.  If you're
 performing a significant amount of work with each Entity as you iterate you
 will probably see a latency win as a result of this.

 Now the less awesome news:
 We didn't get around to making the List returned by PreparedQuery.asList()
 work this same magic, but you can expect this in a future release.

 Some deeper thoughts:
 The underlying RPCs between your app and the datastore fetch results in
 batches.  We fetch an initial batch of results, and once that batch has been
 consumed we fetch the next batch.  But, there's nothing in the API that maps
 to these batches - it's either a List containing the entire result set or an
 Iterable/Iterator that returns Entities one at a time.  An API that provides
 async access to the individual results returned by an Iterable/Iterator
 (IteratorFutureEntity) doesn't really make sense since you don't know
 which call to hasNext() is going to require a new batch to be fetched, and
 without that knowledge, the knowledge of what is going to trigger something
 expensive, you can't really make appropriate use of an asynchronous API.

 Going forward, we're definitely interested in exposing these batches
 directly, and an explicitly async API for these batches makes a lot of sense
 since fetching these batches would map directly to something expensive on
 the server side.

 Hope this helps,
 Max

 On Fri, Nov 26, 2010 at 4:41 PM, Luke lvale...@gmail.com wrote:
  i was taking a look at the 1.4.0 javadoc for AsyncDatastoreService.  i
  see the get, put and delete operations return a Future, but the
  prepare methods return a naked PreparedQuery object, and it doesn't
  look like PreparedQuery has any async get methods.

  does the AsyncDatastoreService not support asynchronous queries, or is
  there something i'm missing?

  glad to see at lets the get and put methods are async, hoping to get
  async queries too (as well as async interfaces to more services).

  --
  You received this message because you are subscribed to the Google Groups
  Google App Engine for Java group.
  To post to this group, send email to
  google-appengine-j...@googlegroups.com.
  To unsubscribe from this group, send email to
  google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com
  .
  For more options, visit this group at
 http://groups.google.com/group/google-appengine-java?hl=en.

 --
 You received this message because you are subscribed to the Google Groups 
 

[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-01 Thread Luke
great, thanks for the insight max.

i have a client that will batch together multiple requests into one
RPC call to my app on GAE.  each of these individual requests may have
one or more datastore accesses.  this may include some prefetch
requests.

so i want to build a mechanism that will interleave these requests
taking advantage of the AsyncDatastoreService for minimum request
latency.

i've gone through my server-side stack and made it asynchronous by
wrapping RPC returns in Future objects.  then i've created a
FutureChain object that takes one or more Future objects as input, and
will return one Future object.  i then have some code that will poll
the ultimate Future objects until all of them have finished.

it ends up being a simple multi-threaded emulation where each
individual request in a batch gets a thread and each thread gives up
control when it makes an Async request.

now for the PreparedQuery, because my app knows how many items i want,
i should be able to wrap it in a special Future object that will try
to pull in that many items when it is polled...but the problem is, i
don't know when the batch has come back, so every time i call next(),
i risk blocking on I/O when i could be initiating another I/O
asynchronously or processing the results of an async I/O.

so until there is explicit knowledge of when the I/O for a batch has
finished, i may be able to get away with reducing the poll-rate of
queries

i suppose i could just query for the keys, then i could use an
explicit Async method to fetch the entities themselves.  if i query
for keys, will they be split up in batches?  any way to know how many
keys will be in one batch?


On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
wrote:
 Hi Luke,

 First the awesome news:
 As of 1.4.0, many queries are implicitly asynchronous.  When you call
 PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the
 query in the background and then immediately return.  This lets you do work
 while the first batch of results is being fetched.  And, when the first
 batch has been consumed we immediately request the next batch.  If you're
 performing a significant amount of work with each Entity as you iterate you
 will probably see a latency win as a result of this.

 Now the less awesome news:
 We didn't get around to making the List returned by PreparedQuery.asList()
 work this same magic, but you can expect this in a future release.

 Some deeper thoughts:
 The underlying RPCs between your app and the datastore fetch results in
 batches.  We fetch an initial batch of results, and once that batch has been
 consumed we fetch the next batch.  But, there's nothing in the API that maps
 to these batches - it's either a List containing the entire result set or an
 Iterable/Iterator that returns Entities one at a time.  An API that provides
 async access to the individual results returned by an Iterable/Iterator
 (IteratorFutureEntity) doesn't really make sense since you don't know
 which call to hasNext() is going to require a new batch to be fetched, and
 without that knowledge, the knowledge of what is going to trigger something
 expensive, you can't really make appropriate use of an asynchronous API.

 Going forward, we're definitely interested in exposing these batches
 directly, and an explicitly async API for these batches makes a lot of sense
 since fetching these batches would map directly to something expensive on
 the server side.

 Hope this helps,
 Max

 On Fri, Nov 26, 2010 at 4:41 PM, Luke lvale...@gmail.com wrote:
  i was taking a look at the 1.4.0 javadoc for AsyncDatastoreService.  i
  see the get, put and delete operations return a Future, but the
  prepare methods return a naked PreparedQuery object, and it doesn't
  look like PreparedQuery has any async get methods.

  does the AsyncDatastoreService not support asynchronous queries, or is
  there something i'm missing?

  glad to see at lets the get and put methods are async, hoping to get
  async queries too (as well as async interfaces to more services).

  --
  You received this message because you are subscribed to the Google Groups
  Google App Engine for Java group.
  To post to this group, send email to
  google-appengine-j...@googlegroups.com.
  To unsubscribe from this group, send email to
  google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com
  .
  For more options, visit this group at
 http://groups.google.com/group/google-appengine-java?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.