[google-appengine] Re: Expected Database Performance with Millions of Rows?

peterk Tue, 10 Mar 2009 11:11:07 -0700

Yeah, sorry, you're right..with regard to a batch get, I'm not sure
there's any way to just get the itemB keys without dereferencing the
whole object..


A thought crossed my mind that you could try storing the key id or
name as strings in itemA and then construct a list of keys by casting
the id strings to keys, but i'm not sure if that's possible..

The reason I wondered about the make-up of the itemB type was to see
if it might be possible to store the data directly with itemA so you
only have to fetch itemAs.. but I guess this isn't desireable for your
app.

I'm intrigued by your comment regarding direct read performance on
itemB. Issues with itemB reads didn't just crop up in the last week or
so, did they? GAE was having problems last week, so performance
results then may not have been typical. What kind of query were you
using to read the itemBs? Every time you do itemA.itemB you're doing a
read on your itemBs, so a single read directly on itemB shouldn't be
massively different from that in terms of performance..

A survey of read cost as entity numbers increase would indeed be
interesting however..I've been making the assumption that read speed
is roughly constant regardless of the number of entities
underneath...but that is just an assumption. Maybe a google person
could chime in with expected read performance with millions of
entities vs a few.

On Mar 10, 4:42 pm, lenza <le...@lenza.org> wrote:
> Thanks for the response peter.  It is my understanding that a
> ReferenceProperty IS actually just a Key (http://code.google.com/
> appengine/docs/python/datastore/
> typesandpropertyclasses.html#ReferenceProperty).  If it isn't, how do
> I get the key of itemB without dereferencing itemB?  There is no
> KeyProperty.
>
> I actually do need all the itemsB's though.  The next thing do is
> memcache the list of itemB's and then call to_xml() on each item B to
> send to the client.  The itemB's are more complex types, but not too
> bad.  Maybe 20 fields of type int/string/bool.  The total itemB size
> is not more than 512bytes.
>
> I would be very interested if anyone could offer some clarification on
> how the datastore scales with number of entities.  Half of the reason
> I have this database scheme is that queries directly on itemB were
> even slower or would fail with db.Timeout.  And that was doing only
> "one read".
>
> On Mar 10, 5:07 am, peterk <peter.ke...@gmail.com> wrote:
>
> > Doh, I just thought of this as one potential simple way to improve
> > your itemB dereferencing:
>
> >http://code.google.com/appengine/docs/python/datastore/modelclass.htm...
>
> > Create a list of the keys to your itemBs without dereferencing them,
> > then pass the list to itemB.get(). I think this performs one single
> > batch get..so it should be faster then 36 seperate reads.
>
> > Try that out and let me know if it improves performance.. :)
>
> > On Mar 10, 12:03 pm, peterk <peter.ke...@gmail.com> wrote:
>
> > > As far as I know:
>
> > > query = A.all().filter('intProperty', val).filter('date >=', date)
>
> > > This is one read. Hence the relatively fast performance for getting
> > > itemAs.
>
> > > But this:
>
> > > itemBs = [itemA.typeB for itemA in itemAs]
>
> > > ..is n reads, where n is the number of itemAs you have. In your case,
> > > an extra 36 reads.
>
> > > That lines up with the figures you're providing..0.4s to get all
> > > itemAs, then 2.2 for the (36) itemB reads.
>
> > > There may be more optimal ways others can share for dereferencing your
> > > itemBs to reduce the number of reads required, but in terms of
> > > scalibility..if the number of itemAs returned will remain the same,
> > > then your execution time for this computation should remain the same
> > > regardless of the size of the database underneath. Datastore reads
> > > are, I think, supposed to be fairly constant regardless of the number
> > > of entities..so if you're always doing 36+1 reads then the cost should
> > > always be roughly the same regardless of how many itemAs and itemBs
> > > are in the datastore. I could be wrong on that..but that's my
> > > expectation.
>
> > > In terms of optimising your itemB dereferencing..do you need all those
> > > itemBs at the point you're adding them to your list at the moment?
> > > What is itemB made up of?
>
> > > On Mar 10, 1:23 am, lenza <le...@lenza.org> wrote:
>
> > > > App Engine.  Here is the situation:
>
> > > > I have a database with about about 50k rows of type A, and maybe 500k
> > > > rows of type B.
>
> > > > Type A contains:
> > > >   - ReferenceProperty (to type B)
> > > >   - DateTimeProperty
> > > >   - IntegerPropery
> > > >   - 3 x StringProperty (at most 5 bytes each)
>
> > > > I am performing the following query:
>
> > > > def myQuery():
> > > >    query = A.all().filter('intProperty', val).filter('date >=', date)
> > > >    itemAs = [itemA for itemA in query]
> > > >    itemBs = [itemA.typeB for itemA in itemAs]
> > > >    return itemBs
>
> > > > Even under no load, when retuning a modest number of elements, this
> > > > function takes so long!  For example, I'm looking at a log entry where
> > > > for 36 items the database took 0.4s to create the list of itemAs and
> > > > 2.2s to create the list of itemB's.  Does this seem excessive?  What
> > > > can I do to speed this up?  I was planing on these databases to grow
> > > > 10x in size (although the queries will return the same number of
> > > > items) so this is concerning.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Expected Database Performance with Millions of Rows?

Reply via email to