[google-appengine] Re: Expected Database Performance with Millions of Rows?

ryan Tue, 10 Mar 2009 14:47:51 -0700

On Mar 10, 11:10 am, peterk <peter.ke...@gmail.com> wrote:

> A survey of read cost as entity numbers increase would indeed be
> interesting however..I've been making the assumption that read speed
> is roughly constant regardless of the number of entities
> underneath...but that is just an assumption. Maybe a google person
> could chime in with expected read performance with millions of
> entities vs a few.


that's the correct assumption. query performance will vary based on a
number of factors, but datastore size generally isn't one of them.
queries should perform roughly the same regardless of the number of
entities in your datastore.

there have already been a number of good threads on datastore
performance. check out a few of the ones i've weighed in on:

http://groups.google.com/group/google-appengine/search?group=google-appengine&q=query+performance+ryanb&qt_g=Search+this+group

in particular:

http://groups.google.com/group/google-appengine/browse_thread/thread/138f2dc6fc7c9d0d/ad7c36d2c27764f4#ad7c36d2c27764f4
http://groups.google.com/group/google-appengine/browse_thread/thread/2e12581d6b518c3a/3717db62e1da5b80#3717db62e1da5b80
http://groups.google.com/group/google-appengine/browse_thread/thread/3cc2ccb87b29369

> As far as I know:
>
> query = A.all().filter('intProperty', val).filter('date >=', date)
>
> This is one read. Hence the relatively fast performance for getting
> itemAs.
>
> But this:
>
> itemBs = [itemA.typeB for itemA in itemAs]
>
> ..is n reads, where n is the number of itemAs you have. In your case,
> an extra 36 reads.

actually, no, they're both the exact same thing under the hood.

queries will always require at least n disk seeks and reads, where n
is the number of results you get back. if you use fetch(), they'll
only need one extra disk seek to read the index. if you don't use fetch
(), the query will read the index and fetch result entities in
batches, as you iterate over the results. currently, the batch size is
20.

given that, if you know ahead of time how many results you want, you
should usually use fetch().

> Yeah, sorry, you're right..with regard to a batch get, I'm not sure
> there's any way to just get the itemB keys without dereferencing the
> whole object..

there is:

http://code.google.com/appengine/docs/python/datastore/propertyclass.html#Property_get_value_for_datastore

people have discussed this in more detail in other threads.

> A thought crossed my mind that you could try storing the key id or
> name as strings in itemA and then construct a list of keys by casting
> the id strings to keys, but i'm not sure if that's possible..

definitely!

http://code.google.com/appengine/docs/python/datastore/keyclass.html#Key_from_path

regardless, in this case, you should just store the keys directly, in
a ReferenceProperty, as discussed.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Expected Database Performance with Millions of Rows?

Reply via email to