Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?
Luke, thanks for the follow up! You're right that sometimes RPC overhead can add up especially with something as fast as Memcache, so batching things is definitely your friend. With the datastore, the RPC overhead should be a much smaller percentage overall of operations, so you see real benefit when you do, well, exactly what you did switching to async operations and batching when possible. This is probably my favorite post of the morning. Here's hoping more developers see this and are inspired by your optimizations =). -- Ikai Lan Developer Programs Engineer, Google App Engine Blogger: http://googleappengine.blogspot.com Reddit: http://www.reddit.com/r/appengine Twitter: http://twitter.com/app_engine On Sat, Dec 4, 2010 at 10:55 PM, Luke lvale...@gmail.com wrote: i finished updating my server to use the AsyncDatastoreService. i also cleaned up my memcache code to batch cache requests. both of these changes allowed me to improve the request time by up to 4x for some requests. from ~80ms to ~20ms. now i can prefetch content for the user with little to no penalty to request latency. in fact, much content will have no latency thanks to prefetching :) the server used to get and set cached objects in memcache for each command in a batch. if i have 4 commands in a batch, that could be up to 8 memcache RPCs as well as the actual work for those commands. that was pretty wasteful. so i updated my server to batch all gets into a getAll, and all puts into a putAll. that made a big difference. the length of each get and put are the same, but now i have no more than two memcache calls no matter how many commands are in a batch. if everything hits the cache, then i don't even need to do a put...the entire request will finish in about 6ms. the server also used the synchronous datastore service. so all i/o was blocking. now it's been updated to use AsyncDatastoreService. the server can kick off all i/o for each command at the beginning of the request and gather up the results when they finish. my queries are still blocking...but that doesn't seem to be much of an impact for now. thanks to the app engine team for delivering this interface :) On Dec 3, 11:52 am, Jeff Schnitzer j...@infohazard.org wrote: Does it take so much time to process your results that it really matters they be done in the optimal order? All that polling code is complicated... unless you're shaving off a lot of real-world time, seems like it's better to just launch all batches and block on the first one. Jeff On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote: great, thanks for the insight max. i have a client that will batch together multiple requests into one RPC call to my app on GAE. each of these individual requests may have one or more datastore accesses. this may include some prefetch requests. so i want to build a mechanism that will interleave these requests taking advantage of theAsyncDatastoreServicefor minimum request latency. i've gone through my server-side stack and made it asynchronous by wrapping RPC returns in Future objects. then i've created a FutureChain object that takes one or more Future objects as input, and will return one Future object. i then have some code that will poll the ultimate Future objects until all of them have finished. it ends up being a simple multi-threaded emulation where each individual request in a batch gets a thread and each thread gives up control when it makes an Async request. now for the PreparedQuery, because my app knows how many items i want, i should be able to wrap it in a special Future object that will try to pull in that many items when it is polled...but the problem is, i don't know when the batch has come back, so every time i call next(), i risk blocking on I/O when i could be initiating another I/O asynchronously or processing the results of an async I/O. so until there is explicit knowledge of when the I/O for a batch has finished, i may be able to get away with reducing the poll-rate of queries i suppose i could just query for the keys, then i could use an explicit Async method to fetch the entities themselves. if i query for keys, will they be split up in batches? any way to know how many keys will be in one batch? On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com wrote: Hi Luke, First the awesome news: As of 1.4.0, many queries are implicitly asynchronous. When you call PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the query in the background and then immediately return. This lets you do work while the first batch of results is being fetched. And, when the first batch has been consumed we immediately request the next batch. If you're performing a significant amount of work with each Entity as you iterate you will probably see a latency win as a result of this. Now the
[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?
i finished updating my server to use the AsyncDatastoreService. i also cleaned up my memcache code to batch cache requests. both of these changes allowed me to improve the request time by up to 4x for some requests. from ~80ms to ~20ms. now i can prefetch content for the user with little to no penalty to request latency. in fact, much content will have no latency thanks to prefetching :) the server used to get and set cached objects in memcache for each command in a batch. if i have 4 commands in a batch, that could be up to 8 memcache RPCs as well as the actual work for those commands. that was pretty wasteful. so i updated my server to batch all gets into a getAll, and all puts into a putAll. that made a big difference. the length of each get and put are the same, but now i have no more than two memcache calls no matter how many commands are in a batch. if everything hits the cache, then i don't even need to do a put...the entire request will finish in about 6ms. the server also used the synchronous datastore service. so all i/o was blocking. now it's been updated to use AsyncDatastoreService. the server can kick off all i/o for each command at the beginning of the request and gather up the results when they finish. my queries are still blocking...but that doesn't seem to be much of an impact for now. thanks to the app engine team for delivering this interface :) On Dec 3, 11:52 am, Jeff Schnitzer j...@infohazard.org wrote: Does it take so much time to process your results that it really matters they be done in the optimal order? All that polling code is complicated... unless you're shaving off a lot of real-world time, seems like it's better to just launch all batches and block on the first one. Jeff On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote: great, thanks for the insight max. i have a client that will batch together multiple requests into one RPC call to my app on GAE. each of these individual requests may have one or more datastore accesses. this may include some prefetch requests. so i want to build a mechanism that will interleave these requests taking advantage of theAsyncDatastoreServicefor minimum request latency. i've gone through my server-side stack and made it asynchronous by wrapping RPC returns in Future objects. then i've created a FutureChain object that takes one or more Future objects as input, and will return one Future object. i then have some code that will poll the ultimate Future objects until all of them have finished. it ends up being a simple multi-threaded emulation where each individual request in a batch gets a thread and each thread gives up control when it makes an Async request. now for the PreparedQuery, because my app knows how many items i want, i should be able to wrap it in a special Future object that will try to pull in that many items when it is polled...but the problem is, i don't know when the batch has come back, so every time i call next(), i risk blocking on I/O when i could be initiating another I/O asynchronously or processing the results of an async I/O. so until there is explicit knowledge of when the I/O for a batch has finished, i may be able to get away with reducing the poll-rate of queries i suppose i could just query for the keys, then i could use an explicit Async method to fetch the entities themselves. if i query for keys, will they be split up in batches? any way to know how many keys will be in one batch? On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com wrote: Hi Luke, First the awesome news: As of 1.4.0, many queries are implicitly asynchronous. When you call PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the query in the background and then immediately return. This lets you do work while the first batch of results is being fetched. And, when the first batch has been consumed we immediately request the next batch. If you're performing a significant amount of work with each Entity as you iterate you will probably see a latency win as a result of this. Now the less awesome news: We didn't get around to making the List returned by PreparedQuery.asList() work this same magic, but you can expect this in a future release. Some deeper thoughts: The underlying RPCs between your app and the datastore fetch results in batches. We fetch an initial batch of results, and once that batch has been consumed we fetch the next batch. But, there's nothing in the API that maps to these batches - it's either a List containing the entire result set or an Iterable/Iterator that returns Entities one at a time. An API that provides async access to the individual results returned by an Iterable/Iterator (IteratorFutureEntity) doesn't really make sense since you don't know which call to hasNext() is going to require a new batch to be fetched,
Re: [appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?
Does it take so much time to process your results that it really matters they be done in the optimal order? All that polling code is complicated... unless you're shaving off a lot of real-world time, seems like it's better to just launch all batches and block on the first one. Jeff On Wed, Dec 1, 2010 at 8:44 PM, Luke lvale...@gmail.com wrote: great, thanks for the insight max. i have a client that will batch together multiple requests into one RPC call to my app on GAE. each of these individual requests may have one or more datastore accesses. this may include some prefetch requests. so i want to build a mechanism that will interleave these requests taking advantage of the AsyncDatastoreService for minimum request latency. i've gone through my server-side stack and made it asynchronous by wrapping RPC returns in Future objects. then i've created a FutureChain object that takes one or more Future objects as input, and will return one Future object. i then have some code that will poll the ultimate Future objects until all of them have finished. it ends up being a simple multi-threaded emulation where each individual request in a batch gets a thread and each thread gives up control when it makes an Async request. now for the PreparedQuery, because my app knows how many items i want, i should be able to wrap it in a special Future object that will try to pull in that many items when it is polled...but the problem is, i don't know when the batch has come back, so every time i call next(), i risk blocking on I/O when i could be initiating another I/O asynchronously or processing the results of an async I/O. so until there is explicit knowledge of when the I/O for a batch has finished, i may be able to get away with reducing the poll-rate of queries i suppose i could just query for the keys, then i could use an explicit Async method to fetch the entities themselves. if i query for keys, will they be split up in batches? any way to know how many keys will be in one batch? On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com wrote: Hi Luke, First the awesome news: As of 1.4.0, many queries are implicitly asynchronous. When you call PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the query in the background and then immediately return. This lets you do work while the first batch of results is being fetched. And, when the first batch has been consumed we immediately request the next batch. If you're performing a significant amount of work with each Entity as you iterate you will probably see a latency win as a result of this. Now the less awesome news: We didn't get around to making the List returned by PreparedQuery.asList() work this same magic, but you can expect this in a future release. Some deeper thoughts: The underlying RPCs between your app and the datastore fetch results in batches. We fetch an initial batch of results, and once that batch has been consumed we fetch the next batch. But, there's nothing in the API that maps to these batches - it's either a List containing the entire result set or an Iterable/Iterator that returns Entities one at a time. An API that provides async access to the individual results returned by an Iterable/Iterator (IteratorFutureEntity) doesn't really make sense since you don't know which call to hasNext() is going to require a new batch to be fetched, and without that knowledge, the knowledge of what is going to trigger something expensive, you can't really make appropriate use of an asynchronous API. Going forward, we're definitely interested in exposing these batches directly, and an explicitly async API for these batches makes a lot of sense since fetching these batches would map directly to something expensive on the server side. Hope this helps, Max On Fri, Nov 26, 2010 at 4:41 PM, Luke lvale...@gmail.com wrote: i was taking a look at the 1.4.0 javadoc for AsyncDatastoreService. i see the get, put and delete operations return a Future, but the prepare methods return a naked PreparedQuery object, and it doesn't look like PreparedQuery has any async get methods. does the AsyncDatastoreService not support asynchronous queries, or is there something i'm missing? glad to see at lets the get and put methods are async, hoping to get async queries too (as well as async interfaces to more services). -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-j...@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups
[appengine-java] Re: no async queries on AsyncDatastoreService for 1.4.0?
great, thanks for the insight max. i have a client that will batch together multiple requests into one RPC call to my app on GAE. each of these individual requests may have one or more datastore accesses. this may include some prefetch requests. so i want to build a mechanism that will interleave these requests taking advantage of the AsyncDatastoreService for minimum request latency. i've gone through my server-side stack and made it asynchronous by wrapping RPC returns in Future objects. then i've created a FutureChain object that takes one or more Future objects as input, and will return one Future object. i then have some code that will poll the ultimate Future objects until all of them have finished. it ends up being a simple multi-threaded emulation where each individual request in a batch gets a thread and each thread gives up control when it makes an Async request. now for the PreparedQuery, because my app knows how many items i want, i should be able to wrap it in a special Future object that will try to pull in that many items when it is polled...but the problem is, i don't know when the batch has come back, so every time i call next(), i risk blocking on I/O when i could be initiating another I/O asynchronously or processing the results of an async I/O. so until there is explicit knowledge of when the I/O for a batch has finished, i may be able to get away with reducing the poll-rate of queries i suppose i could just query for the keys, then i could use an explicit Async method to fetch the entities themselves. if i query for keys, will they be split up in batches? any way to know how many keys will be in one batch? On Nov 29, 11:08 am, Max Ross (Google) maxr+appeng...@google.com wrote: Hi Luke, First the awesome news: As of 1.4.0, many queries are implicitly asynchronous. When you call PreparedQuery.asIterable() or PreparedQuery.asIterator(), we initiate the query in the background and then immediately return. This lets you do work while the first batch of results is being fetched. And, when the first batch has been consumed we immediately request the next batch. If you're performing a significant amount of work with each Entity as you iterate you will probably see a latency win as a result of this. Now the less awesome news: We didn't get around to making the List returned by PreparedQuery.asList() work this same magic, but you can expect this in a future release. Some deeper thoughts: The underlying RPCs between your app and the datastore fetch results in batches. We fetch an initial batch of results, and once that batch has been consumed we fetch the next batch. But, there's nothing in the API that maps to these batches - it's either a List containing the entire result set or an Iterable/Iterator that returns Entities one at a time. An API that provides async access to the individual results returned by an Iterable/Iterator (IteratorFutureEntity) doesn't really make sense since you don't know which call to hasNext() is going to require a new batch to be fetched, and without that knowledge, the knowledge of what is going to trigger something expensive, you can't really make appropriate use of an asynchronous API. Going forward, we're definitely interested in exposing these batches directly, and an explicitly async API for these batches makes a lot of sense since fetching these batches would map directly to something expensive on the server side. Hope this helps, Max On Fri, Nov 26, 2010 at 4:41 PM, Luke lvale...@gmail.com wrote: i was taking a look at the 1.4.0 javadoc for AsyncDatastoreService. i see the get, put and delete operations return a Future, but the prepare methods return a naked PreparedQuery object, and it doesn't look like PreparedQuery has any async get methods. does the AsyncDatastoreService not support asynchronous queries, or is there something i'm missing? glad to see at lets the get and put methods are async, hoping to get async queries too (as well as async interfaces to more services). -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-j...@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.comgoogle-appengine-java%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-j...@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.