subject:"\[appengine\-java\] Re\: no async queries on AsyncDatastoreService for 1.4.0\?"

Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-06 Thread Ikai Lan (Google)

Luke, thanks for the follow up! You're right that sometimes RPC
overhead can add up especially with something as fast as Memcache, so
batching things is definitely your friend. With the datastore, the RPC
overhead should be a much smaller percentage overall of operations, so
you see real benefit when you do, well, exactly what you did switching
to async operations and batching when possible.

This is probably my favorite post of the morning. Here's hoping more
developers see this and are inspired by your optimizations =).

--
Ikai Lan
Developer Programs Engineer, Google App Engine
Blogger: http://googleappengine.blogspot.com
Reddit: http://www.reddit.com/r/appengine
Twitter: http://twitter.com/app_engine



On Sat, Dec 4, 2010 at 10:55 PM, Luke lvale...@gmail.com wrote:
 i finished updating my server to use the AsyncDatastoreService.  i
 also cleaned up my memcache code to batch cache requests.  both of
 these changes allowed me to improve the request time by up to 4x for
 some requests.  from ~80ms to ~20ms.  now i can prefetch content for
 the user with little to no penalty to request latency.  in fact, much
 content will have no latency thanks to prefetching :)

 the server used to get and set cached objects in memcache for each
 command in a batch.  if i have 4 commands in a batch, that could be up
 to 8 memcache RPCs as well as the actual work for those commands.
 that was pretty wasteful.  so i updated my server to batch all gets
 into a getAll, and all puts into a putAll.  that made a big
 difference.  the length of each get and put are the same, but now i
 have no more than two memcache calls no matter how many commands are
 in a batch.  if everything hits the cache, then i don't even need to
 do a put...the entire request will finish in about 6ms.

 the server also used the synchronous datastore service. so all i/o was
 blocking.  now it's been updated to use AsyncDatastoreService.  the
 server can kick off all i/o for each command at the beginning of the
 request and gather up the results when they finish.

 my queries are still blocking...but that doesn't seem to be much of an
 impact for now.

 thanks to the app engine team for delivering this interface :)

 On Dec 3, 11:52 am, Jeff Schnitzer j...@infohazard.org wrote:
 Does it take so much time to process your results that it really
 matters they be done in the optimal order?

 All that polling code is complicated... unless you're shaving off a
 lot of real-world time, seems like it's better to just launch all
 batches and block on the first one.

 Jeff

 On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote:
  great, thanks for the insight max.

  i have a client that will batch together multiple requests into one
  RPC call to my app on GAE.  each of these individual requests may have
  one or more datastore accesses.  this may include some prefetch
  requests.

  so i want to build a mechanism that will interleave these requests
  taking advantage of theAsyncDatastoreServicefor minimum request
  latency.

  i've gone through my server-side stack and made it asynchronous by
  wrapping RPC returns in Future objects.  then i've created a
  FutureChain object that takes one or more Future objects as input, and
  will return one Future object.  i then have some code that will poll
  the ultimate Future objects until all of them have finished.

  it ends up being a simple multi-threaded emulation where each
  individual request in a batch gets a thread and each thread gives up
  control when it makes an Async request.

  now for the PreparedQuery, because my app knows how many items i want,
  i should be able to wrap it in a special Future object that will try
  to pull in that many items when it is polled...but the problem is, i
  don't know when the batch has come back, so every time i call next(),
  i risk blocking on I/O when i could be initiating another I/O
  asynchronously or processing the results of an async I/O.

  so until there is explicit knowledge of when the I/O for a batch has
  finished, i may be able to get away with reducing the poll-rate of
  queries

  i suppose i could just query for the keys, then i could use an
  explicit Async method to fetch the entities themselves.  if i query
  for keys, will they be split up in batches?  any way to know how many
  keys will be in one batch?

  On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
  wrote:
  Hi Luke,

  First the awesome news:
  As of 1.4.0, many queries are implicitly asynchronous.  When you call
  PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the
  query in the background and then immediately return.  This lets you do 
  work
  while the first batch of results is being fetched.  And, when the first
  batch has been consumed we immediately request the next batch.  If you're
  performing a significant amount of work with each Entity as you iterate 
  you
  will probably see a latency win as a result of this.

  Now the

[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-04 Thread Luke

i finished updating my server to use the AsyncDatastoreService.  i
also cleaned up my memcache code to batch cache requests.  both of
these changes allowed me to improve the request time by up to 4x for
some requests.  from ~80ms to ~20ms.  now i can prefetch content for
the user with little to no penalty to request latency.  in fact, much
content will have no latency thanks to prefetching :)

the server used to get and set cached objects in memcache for each
command in a batch.  if i have 4 commands in a batch, that could be up
to 8 memcache RPCs as well as the actual work for those commands.
that was pretty wasteful.  so i updated my server to batch all gets
into a getAll, and all puts into a putAll.  that made a big
difference.  the length of each get and put are the same, but now i
have no more than two memcache calls no matter how many commands are
in a batch.  if everything hits the cache, then i don't even need to
do a put...the entire request will finish in about 6ms.

the server also used the synchronous datastore service. so all i/o was
blocking.  now it's been updated to use AsyncDatastoreService.  the
server can kick off all i/o for each command at the beginning of the
request and gather up the results when they finish.

my queries are still blocking...but that doesn't seem to be much of an
impact for now.

thanks to the app engine team for delivering this interface :)

On Dec 3, 11:52 am, Jeff Schnitzer j...@infohazard.org wrote:
 Does it take so much time to process your results that it really
 matters they be done in the optimal order?

 All that polling code is complicated... unless you're shaving off a
 lot of real-world time, seems like it's better to just launch all
 batches and block on the first one.

 Jeff

 On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote:
  great, thanks for the insight max.

  i have a client that will batch together multiple requests into one
  RPC call to my app on GAE.  each of these individual requests may have
  one or more datastore accesses.  this may include some prefetch
  requests.

  so i want to build a mechanism that will interleave these requests
  taking advantage of theAsyncDatastoreServicefor minimum request
  latency.

  i've gone through my server-side stack and made it asynchronous by
  wrapping RPC returns in Future objects.  then i've created a
  FutureChain object that takes one or more Future objects as input, and
  will return one Future object.  i then have some code that will poll
  the ultimate Future objects until all of them have finished.

  it ends up being a simple multi-threaded emulation where each
  individual request in a batch gets a thread and each thread gives up
  control when it makes an Async request.

  now for the PreparedQuery, because my app knows how many items i want,
  i should be able to wrap it in a special Future object that will try
  to pull in that many items when it is polled...but the problem is, i
  don't know when the batch has come back, so every time i call next(),
  i risk blocking on I/O when i could be initiating another I/O
  asynchronously or processing the results of an async I/O.

  so until there is explicit knowledge of when the I/O for a batch has
  finished, i may be able to get away with reducing the poll-rate of
  queries

  i suppose i could just query for the keys, then i could use an
  explicit Async method to fetch the entities themselves.  if i query
  for keys, will they be split up in batches?  any way to know how many
  keys will be in one batch?

  On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
  wrote:
  Hi Luke,

  First the awesome news:
  As of 1.4.0, many queries are implicitly asynchronous.  When you call
  PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the
  query in the background and then immediately return.  This lets you do work
  while the first batch of results is being fetched.  And, when the first
  batch has been consumed we immediately request the next batch.  If you're
  performing a significant amount of work with each Entity as you iterate you
  will probably see a latency win as a result of this.

  Now the less awesome news:
  We didn't get around to making the List returned by PreparedQuery.asList()
  work this same magic, but you can expect this in a future release.

  Some deeper thoughts:
  The underlying RPCs between your app and the datastore fetch results in
  batches.  We fetch an initial batch of results, and once that batch has 
  been
  consumed we fetch the next batch.  But, there's nothing in the API that 
  maps
  to these batches - it's either a List containing the entire result set or 
  an
  Iterable/Iterator that returns Entities one at a time.  An API that 
  provides
  async access to the individual results returned by an Iterable/Iterator
  (IteratorFutureEntity) doesn't really make sense since you don't know
  which call to hasNext() is going to require a new batch to be fetched,

Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-03 Thread Jeff Schnitzer

Does it take so much time to process your results that it really
matters they be done in the optimal order?

All that polling code is complicated... unless you're shaving off a
lot of real-world time, seems like it's better to just launch all
batches and block on the first one.

Jeff

On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote:
great, thanks for the insight max.

i have a client that will batch together multiple requests into one
RPC call to my app on GAE. each of these individual requests may have
one or more datastore accesses. this may include some prefetch
requests.

so i want to build a mechanism that will interleave these requests
taking advantage of the AsyncDatastoreService for minimum request
latency.

i've gone through my server-side stack and made it asynchronous by
wrapping RPC returns in Future objects. then i've created a
FutureChain object that takes one or more Future objects as input, and
will return one Future object. i then have some code that will poll
the ultimate Future objects until all of them have finished.

it ends up being a simple multi-threaded emulation where each
individual request in a batch gets a thread and each thread gives up
control when it makes an Async request.

now for the PreparedQuery, because my app knows how many items i want,
i should be able to wrap it in a special Future object that will try
to pull in that many items when it is polled...but the problem is, i
don't know when the batch has come back, so every time i call next(),
i risk blocking on I/O when i could be initiating another I/O
asynchronously or processing the results of an async I/O.

so until there is explicit knowledge of when the I/O for a batch has
finished, i may be able to get away with reducing the poll-rate of
queries

i suppose i could just query for the keys, then i could use an
explicit Async method to fetch the entities themselves. if i query
for keys, will they be split up in batches? any way to know how many
keys will be in one batch?

On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
wrote:
Hi Luke,

First the awesome news:
As of 1.4.0, many queries are implicitly asynchronous. When you call
PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the
query in the background and then immediately return. This lets you do work
while the first batch of results is being fetched. And, when the first
batch has been consumed we immediately request the next batch. If you're
performing a significant amount of work with each Entity as you iterate you
will probably see a latency win as a result of this.

Now the less awesome news:
We didn't get around to making the List returned by PreparedQuery.asList()
work this same magic, but you can expect this in a future release.

Some deeper thoughts:
The underlying RPCs between your app and the datastore fetch results in
batches. We fetch an initial batch of results, and once that batch has been
consumed we fetch the next batch. But, there's nothing in the API that maps
to these batches - it's either a List containing the entire result set or an
Iterable/Iterator that returns Entities one at a time. An API that provides
async access to the individual results returned by an Iterable/Iterator
(IteratorFutureEntity) doesn't really make sense since you don't know
which call to hasNext() is going to require a new batch to be fetched, and
without that knowledge, the knowledge of what is going to trigger something
expensive, you can't really make appropriate use of an asynchronous API.

Going forward, we're definitely interested in exposing these batches
directly, and an explicitly async API for these batches makes a lot of sense
since fetching these batches would map directly to something expensive on
the server side.

Hope this helps,
Max

On Fri, Nov 26, 2010 at 4:41 PM, Luke lvale...@gmail.com wrote:
i was taking a look at the 1.4.0 javadoc for AsyncDatastoreService. i
see the get, put and delete operations return a Future, but the
prepare methods return a naked PreparedQuery object, and it doesn't
look like PreparedQuery has any async get methods.

does the AsyncDatastoreService not support asynchronous queries, or is
there something i'm missing?

glad to see at lets the get and put methods are async, hoping to get
async queries too (as well as async interfaces to more services).

--
You received this message because you are subscribed to the Google Groups
Google App Engine for Java group.
To post to this group, send email to
google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com
.
For more options, visit this group at
http://groups.google.com/group/google-appengine-java?hl=en.

--
You received this message because you are subscribed to the Google Groups

[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

2010-12-01 Thread Luke

great, thanks for the insight max.

so i want to build a mechanism that will interleave these requests
taking advantage of the AsyncDatastoreService for minimum request
latency.

it ends up being a simple multi-threaded emulation where each
individual request in a batch gets a thread and each thread gives up
control when it makes an Async request.

so until there is explicit knowledge of when the I/O for a batch has
finished, i may be able to get away with reducing the poll-rate of
queries

On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com
wrote:
Hi Luke,

Now the less awesome news:
We didn't get around to making the List returned by PreparedQuery.asList()
work this same magic, but you can expect this in a future release.

Hope this helps,
Max

does the AsyncDatastoreService not support asynchronous queries, or is
there something i'm missing?

glad to see at lets the get and put methods are async, hoping to get
async queries too (as well as async interfaces to more services).

--
You received this message because you are subscribed to the Google Groups
Google App Engine for Java group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine-java?hl=en.

Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?

4 matches

Site Navigation

Mail list logo

Footer information