[google-appengine] Re: "Ancestor is" performance
Oops, missed the second reference: [2] Section "Tips for using entity groups: " in http://code.google.com/appengine/docs/python/datastore/keysandentitygroups.html On Jan 19, 9:10 am, Tony Arkles wrote: > Hi everyone! > > In a thread [1], and in the documentation [2], it says that setting > ancestors doesn't affect performance, but I'm not sure that this is > the case. > > I set up two queries, one using "WHERE locationKey = :1" (locationKey > is a db.StringProperty), and one using "WHERE ANCESTOR IS :1" (the > ancestor is an entity created based on the locationKey). > > The measured "ms-cpu" in the request logs comes out WAY smaller for > the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for > 1,000 entities, and roughly this same ratio for smaller queries) > > Does anyone have any thoughts on this? Did I mess something up, or is > there something from the documentation, or is it something else > entirely? > > Cheers > Tony > > [1]http://groups.google.com/group/google-appengine/browse_thread/thread/... --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: "Ancestor is" performance
> The measured "ms-cpu" in the request logs comes out WAY smaller for > the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for > 1,000 entities, and roughly this same ratio for smaller queries) > > Does anyone have any thoughts on this? Did I mess something up, or is > there something from the documentation, or is it something else > entirely? > >From your second link: All entities in a group are stored in the same datastore node. I guess this means that entities from the same group are stored close to each other. When your query uses "ANCESTOR IS", the query engine can take advantage of this. Just a speculation though... --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: "Ancestor is" performance
On Jan 19, 5:42 pm, Alexander Kojevnikov wrote: > From your second link: > > All entities in a group are stored in the same datastore node. > > I guess this means that entities from the same group are stored close > to each other. When your query uses "ANCESTOR IS", the query engine > can take advantage of this. Just a speculation though... Yeah, one idea we tossed around at the office was that it knows *which* node to go to to fetch the entities. It'd be awesome to get clarification about this though, it looks like a pretty big performance boost. Additionally, the ms-cpu advantage is lost if you add an "ORDER BY" clause to the GQL. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: "Ancestor is" performance
On Jan 19, 7:10 am, Tony Arkles wrote: > > In a thread [1], and in the documentation [2], it says that setting > ancestors doesn't affect performance, but I'm not sure that this is > the case. ... > The measured "ms-cpu" in the request logs comes out WAY smaller for > the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for > 1,000 entities, and roughly this same ratio for smaller queries) the important distinction to make here is between performance and cost. the thread and doc you cited are correct in that ancestor queries don't differ in performance from non-ancestor queries. however, they may differ in their CPU cost, which is what you noticed. On Jan 19, 3:42 pm, Alexander Kojevnikov wrote: > > All entities in a group are stored in the same datastore node. > > I guess this means that entities from the same group are stored close > to each other. When your query uses "ANCESTOR IS", the query engine > can take advantage of this. Just a speculation though... this is definitely correct. the query engine can take advantage of this in some kinds of queries, but most ancestor queries use a composite (ie developer-defined) index. those queries don't take advantage of the entity group locality. another point i'd reiterate is that regardless of the filters, ancestor, and sort orders you use, query performance should generally be the same. a query consists of a single, bounded, bigtable scan over an index table, along with individual row lookups for each result entity. details in http://snarfed.org/space/datastore_talk.html . this means that query performance will depend on the number and size of the result entities fetched, but not (generally) on the query itself. the one exception is merge join queries, ie queries that include only equality filters, and maybe an ancestor, but no sort orders. those queries are implemented using a somewhat more expensive algorithm. it's still roughly O(n) in the number of result entities fetched, but it has a higher constant factor. this is described in the slides linked above, and has been discussed in more detail in other threads on this group. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: "Ancestor is" performance
Thanks for the excellent reply Ryan! On Jan 20, 3:54 pm, ryan wrote: > On Jan 19, 7:10 am, TonyArkles wrote: > > > > > In a thread [1], and in the documentation [2], it says that setting > > ancestors doesn't affect performance, but I'm not sure that this is > > the case. > ... > > The measured "ms-cpu" in the request logs comes out WAY smaller for > > the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for > > 1,000 entities, and roughly this same ratio for smaller queries) > > the important distinction to make here is between performance and > cost. the thread and doc you cited are correct in that ancestor > queries don't differ in performance from non-ancestor queries. > however, they may differ in their CPU cost, which is what you noticed. > > On Jan 19, 3:42 pm, Alexander Kojevnikov > wrote: > > > > > All entities in a group are stored in the same datastore node. > > > I guess this means that entities from the same group are stored close > > to each other. When your query uses "ANCESTOR IS", the query engine > > can take advantage of this. Just a speculation though... > > this is definitely correct. the query engine can take advantage of > this in some kinds of queries, but most ancestor queries use a > composite (ie developer-defined) index. those queries don't take > advantage of the entity group locality. > > another point i'd reiterate is that regardless of the filters, > ancestor, and sort orders you use, query performance should generally > be the same. a query consists of a single, bounded, bigtable scan over > an index table, along with individual row lookups for each result > entity. details inhttp://snarfed.org/space/datastore_talk.html. this > means that query performance will depend on the number and size of the > result entities fetched, but not (generally) on the query itself. > > the one exception is merge join queries, ie queries that include only > equality filters, and maybe an ancestor, but no sort orders. those > queries are implemented using a somewhat more expensive algorithm. > it's still roughly O(n) in the number of result entities fetched, but > it has a higher constant factor. this is described in the slides > linked above, and has been discussed in more detail in other threads > on this group. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---