[google-appengine] Re: "Ancestor is" performance

2009-01-19 Thread Tony Arkles

Oops, missed the second reference:

[2] Section "Tips for using entity groups: " in
http://code.google.com/appengine/docs/python/datastore/keysandentitygroups.html

On Jan 19, 9:10 am, Tony Arkles  wrote:
> Hi everyone!
>
> In a thread [1], and in the documentation [2], it says that setting
> ancestors doesn't affect performance, but I'm not sure that this is
> the case.
>
> I set up two queries, one using "WHERE locationKey = :1" (locationKey
> is a db.StringProperty), and one using "WHERE ANCESTOR IS :1" (the
> ancestor is an entity created based on the locationKey).
>
> The measured "ms-cpu" in the request logs comes out WAY smaller for
> the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for
> 1,000 entities, and roughly this same ratio for smaller queries)
>
> Does anyone have any thoughts on this?  Did I mess something up, or is
> there something from the documentation, or is it something else
> entirely?
>
> Cheers
> Tony
>
> [1]http://groups.google.com/group/google-appengine/browse_thread/thread/...
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: "Ancestor is" performance

2009-01-19 Thread Alexander Kojevnikov

> The measured "ms-cpu" in the request logs comes out WAY smaller for
> the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for
> 1,000 entities, and roughly this same ratio for smaller queries)
>
> Does anyone have any thoughts on this?  Did I mess something up, or is
> there something from the documentation, or is it something else
> entirely?
>
>From your second link:

  All entities in a group are stored in the same datastore node.

I guess this means that entities from the same group are stored close
to each other. When your query uses "ANCESTOR IS", the query engine
can take advantage of this. Just a speculation though...
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: "Ancestor is" performance

2009-01-20 Thread Tony Arkles



On Jan 19, 5:42 pm, Alexander Kojevnikov 
wrote:
> From your second link:
>
>   All entities in a group are stored in the same datastore node.
>
> I guess this means that entities from the same group are stored close
> to each other. When your query uses "ANCESTOR IS", the query engine
> can take advantage of this. Just a speculation though...

Yeah, one idea we tossed around at the office was that it knows
*which* node to go to to fetch the entities.  It'd be awesome to get
clarification about this though, it looks like a pretty big
performance boost.

Additionally, the ms-cpu advantage is lost if you add an "ORDER BY"
clause to the GQL.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: "Ancestor is" performance

2009-01-20 Thread ryan

On Jan 19, 7:10 am, Tony Arkles  wrote:
>
> In a thread [1], and in the documentation [2], it says that setting
> ancestors doesn't affect performance, but I'm not sure that this is
> the case.
...
> The measured "ms-cpu" in the request logs comes out WAY smaller for
> the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for
> 1,000 entities, and roughly this same ratio for smaller queries)

the important distinction to make here is between performance and
cost. the thread and doc you cited are correct in that ancestor
queries don't differ in performance from non-ancestor queries.
however, they may differ in their CPU cost, which is what you noticed.

On Jan 19, 3:42 pm, Alexander Kojevnikov 
wrote:
>
>   All entities in a group are stored in the same datastore node.
>
> I guess this means that entities from the same group are stored close
> to each other. When your query uses "ANCESTOR IS", the query engine
> can take advantage of this. Just a speculation though...

this is definitely correct. the query engine can take advantage of
this in some kinds of queries, but most ancestor queries use a
composite (ie developer-defined) index. those queries don't take
advantage of the entity group locality.

another point i'd reiterate is that regardless of the filters,
ancestor, and sort orders you use, query performance should generally
be the same. a query consists of a single, bounded, bigtable scan over
an index table, along with individual row lookups for each result
entity. details in http://snarfed.org/space/datastore_talk.html . this
means that query performance will depend on the number and size of the
result entities fetched, but not (generally) on the query itself.

the one exception is merge join queries, ie queries that include only
equality filters, and maybe an ancestor, but no sort orders. those
queries are implemented using a somewhat more expensive algorithm.
it's still roughly O(n) in the number of result entities fetched, but
it has a higher constant factor. this is described in the slides
linked above, and has been discussed in more detail in other threads
on this group.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: "Ancestor is" performance

2009-01-21 Thread Tony Arkles

Thanks for the excellent reply Ryan!



On Jan 20, 3:54 pm, ryan  wrote:
> On Jan 19, 7:10 am, TonyArkles wrote:
>
>
>
> > In a thread [1], and in the documentation [2], it says that setting
> > ancestors doesn't affect performance, but I'm not sure that this is
> > the case.
> ...
> > The measured "ms-cpu" in the request logs comes out WAY smaller for
> > the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for
> > 1,000 entities, and roughly this same ratio for smaller queries)
>
> the important distinction to make here is between performance and
> cost. the thread and doc you cited are correct in that ancestor
> queries don't differ in performance from non-ancestor queries.
> however, they may differ in their CPU cost, which is what you noticed.
>
> On Jan 19, 3:42 pm, Alexander Kojevnikov 
> wrote:
>
>
>
> >   All entities in a group are stored in the same datastore node.
>
> > I guess this means that entities from the same group are stored close
> > to each other. When your query uses "ANCESTOR IS", the query engine
> > can take advantage of this. Just a speculation though...
>
> this is definitely correct. the query engine can take advantage of
> this in some kinds of queries, but most ancestor queries use a
> composite (ie developer-defined) index. those queries don't take
> advantage of the entity group locality.
>
> another point i'd reiterate is that regardless of the filters,
> ancestor, and sort orders you use, query performance should generally
> be the same. a query consists of a single, bounded, bigtable scan over
> an index table, along with individual row lookups for each result
> entity. details inhttp://snarfed.org/space/datastore_talk.html. this
> means that query performance will depend on the number and size of the
> result entities fetched, but not (generally) on the query itself.
>
> the one exception is merge join queries, ie queries that include only
> equality filters, and maybe an ancestor, but no sort orders. those
> queries are implemented using a somewhat more expensive algorithm.
> it's still roughly O(n) in the number of result entities fetched, but
> it has a higher constant factor. this is described in the slides
> linked above, and has been discussed in more detail in other threads
> on this group.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---