[google-appengine] Re: Why Google AppEngine sucks

herbie Tue, 29 Sep 2009 02:04:30 -0700


A Very good point re. Google changing the cpu usage formula.
Unfortunately we have to pay when Google change the formula. It would
be great if they could tell us if this is true and why.


Your points on what effects api_cpu usage the most seem spot on.

I've stripped back my indexed fields to the bare minimum and stopped
using run_in_transaction()  (a big cpu hit) to try and reduce the
api_cpu usage. But still the reported values a much high now than
previously when all that stuff was still in!


On Sep 28, 11:18 pm, Peter Liu <tinyee...@gmail.com> wrote:
> After looking at the reported api_cpu usage numbers for so long, I am
> convinced that the usage is estimated with a formula or model. It
> varies over time, maybe depending on server condition in that time
> period.
>
> For example:
> 1 put -> 66ms
> 10 batch put same object-> 666ms
> 100 batch put same object ->6666ms
>
> Those are real numbers reported. I see plenty of logs that have
> similar pattern.
>
> From my observation, the usage ms depends on variables:
> 1. The API call itself, batch put is just multiple of the cpu of a
> single put
> 2. The number of fields that needed to be updated (for new objects,
> most fields, my guess on why  new objects use way higher cpu then a
> subsequent update)
> 3. Whether the fields are indexed (can be major for a wide schema,
> since by default all index-able fields are indexed)
>
> Very very surprisingly, it's doesn't depend much on the object
> size.... say you have kinds with blob of 1k, 10k, or 100k. Each of
> those use the same api_cpu. In fact, a blob of 100k might take less
> cpu than an string field that's indexed.
>
> This is one of the weirdness that I hope google will clarify. The
> developers will optimize base on those reported numbers (after all,
> that's all we can observe). The current way of how api_cpu is
> calculated will encourage developers to bias towards practices that
> minimize that number, simply because it's the bottleneck (again, all
> that we can see on the quota dashboard).
>
> If google don't want to release the formula, at least tell us whether
> it will be changed in the future. I hate to see people's optimization
> effort going to waste because the formula is changed.
>
> To answer your question of why the usage is increased without code
> change. My guess is that the cpu usage formula is changed. I doubt the
> datastore suddenly become less efficient and use 3x cpus.
>
> On Sep 28, 2:46 pm, herbie <4whi...@o2.co.uk> wrote:
>
> > GAE doesn't suck.   But...
>
> > I've really enjoyed building my first GAE project. It does almost
> > everything I want it to do and so far its seems responsive and
> > reliable enough.  But it is expensive in terms of api_cpu  - which is
> > billable.   The vast majority of my quota is used making api calls in
> > particular writing to the datastore.  Reading/writing to the GAE
> > datastore is too expensive (esp. in relation to other billable
> > services).
>
> > I have to trust Google to make these api calls as efficient as
> > possible, as I can't change them.
>
> > What concerns me most is that recently I experienced a sudden and
> > dramatic increase in api_cpu usage with no changes to my code.  
> > (seehttp://groups.google.com/group/google-appengine/browse_thread/thread/...
> > )  The api_cpu values have remained much higher than previously.   I
> > still have no idea why.   I haven't released my app yet, but if  I had
> > my bill would have doubled overnight  - with no changes at my end!
>
> > Could Google work on optimising reading & writing to the datastore and/
> > or reduce the billing on api_cpu?
>
> > On Sep 28, 3:49 pm, Jeroen <aliq...@gmail.com> wrote:
>
> > > I did have indexed=False at strategic places ;) However, 5 properties
> > > were listproperties (updates were hourly).
> > > It's mostly annoying that something I have very little control over
> > > turned out to be the most cpu intensive part of my application.
>
> > > On 28 sep, 02:31, GregF <g.fawc...@gmail.com> wrote:
>
> > > > I'm not seeing the same issues as you - I have more objects but you
> > > > didn't specify how frequently you update, so maybe that's where the
> > > > difference is.
>
> > > > Dumb (of me if you have, of you if you haven't!) question - have you
> > > > added "Indexed=False" to your model properties? This can make an
> > > > enormous difference.
>
> > > > On Sep 27, 1:37 am, Jeroen <aliq...@gmail.com> wrote:
>
> > > > > Biggest problem: The datastore is deadslow and uses indane amounts of
> > > > > cpu. I found 2 ways around it, backwards ones imho, but if it works,
> > > > > it works.
> > > > > Maybe my usecase is unique, as it involves frequent updates to the
> > > > > data (10k records) stored.
>
> > > > > 1st solution:
> > > > > Only update the datastore after 2 new updates of the data, store
> > > > > intermittent data in memcache.  (eg: 1) store in datastore & put in
> > > > > cache, 2) fetch from cache, update cache (if not in cache update
> > > > > datastore) 3) store in datastore and update cache 4) fetch from cache,
> > > > > update cache (if not in cache update datastore) 5) datastore, 6)
> > > > > cache, 7) .... etc )
>
> > > > > 2nd solution:
> > > > > Store non indexed data (about 10 fields) in one big blob, that you
> > > > > serialize when storing data, deserializing when reading.
>
> > > > > Both work fairly well (combining both methods reduced cpa usage by
> > > > > over 50%), but are cripled, by appenginge.
>
> > > > > The 1st method need a more reliable memache, atleast its limits needs
> > > > > to be clear (there havwe been moments it was only able to held 8k
> > > > > (total 10mb data) items, and moments it would held 20k (adding to
> > > > > about 30mb), when only holding 8k, data gets lost. Of course the
> > > > > nature of a cache is that it can loose date, but it would be nice if
> > > > > it behaved in a predicatable way)
>
> > > > > The 2nd method needs a good performing serialization mechanism. For
> > > > > python the obvious choice is pickle (which i'm using), but in all it's
> > > > > wisdom google decided not to include cpickle. Thus performance is
> > > > > terrible. (Yaml yielded even worse results, as the c-extention needed
> > > > > to speed things up isn't availble )(Another option might be protocol
> > > > > buffers, well.. those don't work on appengine (the google package in
> > > > > which the python code resides is locked or something))
>
> > > > > All this gives me the feeling that I'm forced to pay CPU costs that
> > > > > shouldn't be there:
> > > > > - i didn't ask for a deadslow bigtable datastore (it really is that
> > > > > damned datastore that's still eating half mu CPU usage)
> > > > > - i try to optimize, but the tools for it are crippled
>
> > > > > I fully understand the successtory related to serving static content.
> > > > > But for dynamic content, for future project, i'll hapillily not try
> > > > > using appengine anymore.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Why Google AppEngine sucks

Reply via email to