Hi All, As I understand it, the process of performing a single fetch (call to get()) from the dastastore using a key basically involves finding the host housing the entity, opening a socket, fetching the data, and then cleaning up the connection. So to fetch something like 30 entities from the datastore, you're repeating the process 30 times over in serial, each time incurring whatever overhead is involved. I also read that if you perform bulk fetches, (ie passing multiple keys at once) you can eliminate a great deal of that overhead. In one of the videos I watched from Google I/0 2009, the presenter (whose name I forget - d'oh) said that performing a bulk fetch actually performs the fetches in parallel from the data store and you shoudl see requests noticeably faster.
Currently I have a few situations where the app performs many fetches from the data store in serially, rather than in bulk, and I believe it is the result of these requests being extremely slow and CPU intensive. Where possible, I put into place as much bulk fetches as I can but I'm a little stuck in a few places. I'm basing the fetch latency on today's numbers -- http://code.google.com/status/appengine/detail/datastore/2010/04/19. Anomalies aside, It looks like the get latency somewhere between 80ms and 160ms, let's spit difference and just say that it's 120 milliseconds. Additionally, the query latency is somewhere between 250ms and 500ms. Splitting the difference, that's 375ms. I'm just going to use those numbers as a ballpark estimate for fetching multiple entities from the data store, feel free to correct me if any of my reasoning is flawed or incorrect. Example 1: http://imagepaste.nullnetwork.net/viewimage.php?id=830 Given the above example, I'm assuming that if I performed an ancestor query with Foo("A") as the ancestor it would effectively bulk-fetch the entire entity group. I could then use the result of that query to get the data I need. That would make the fetch from the datastore one query, 375 milliseconds versus (7entities * 160ms) or 1120ms. So long as you need 3 or more entities (3 * 160) it would stand to reason that you're just better off just fetching the whole thing. In some simple tests I did, that seemed to be the case, the query approach was faster, and that's great if you know everything is in the same entity group. Example 2: http://imagepaste.nullnetwork.net/viewimage.php?id=831 Given the above example, none of the entities are in the same entity group, but I would want to try to perform bulk fetches wherever possible. I would first fetch Foo("A"). I would then see that it has two key properties pointing to Bar("B") and Bar("C"), perform a fetch of those two entities at once. Finally, I would see that Bar("B") and Bar("C") each reference two more entities -- Baz("D"), Baz("E"), Baz("F"), and Baz("G") for a total of four. In the worst case, I would fetch each entity individually taking, once again, 1120ms. In the best case and I perform 3 fetches, (fetch A first, then fetch B and C, then lastly fetch D, E, F, and G), it would be more in the neighborhood of 480 milliseconds. It's still an improvement over fetching each entity individually, but not much. So I was thinking of ways to improve this, the second example in particular, because I have a few places in my app where that exact thing is happening. Right now it's actually implemented with individual fetches, but it backed by memcache in many circumstances so that definitely helps. So given that, here's my questions... - When serializing the objects, would it be worthwhile adding some sort of metadata in the entity that would tell me what other entities it references (either directly or indirectly) so that I could fetch the whole thing with one or two API calls? I was thinking that an entity could have child entities with all the keys it references directly or indirectly. This would be a huge pain to implement, and I'm not sure it would make a noticeable performance boost. - Is there something "under the covers" of the API that actually makes more efficient usage of resources that I don't know about? - Is there something in the API that I don't know about that could make the second example faster w/o much effort? - Is my design just bad and I should figure out a better way of doing it? If so, how would I go about doing that? Alright, that's all for now. Thanks, Patrick. -- Patrick H. Twohig. Namazu Studios P.O. Box 34161 San Diego, CA 92163-4161 -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.