Thanks Brandon.  Many of the DeadlineExceededErrors were occurring during 
warmup requests, according to the stacktraces, during python import 
statements.  I upped the number of idle instances in an attempt to mitigate 
this sort of thrashing, and your advice makes sense for this case.  Our 
pending latency is set to 'Automatic' on both ends.

I'm attaching some graphs from the period when this was the worst

Instances:

<https://lh4.googleusercontent.com/--AtYMbWJ4ek/TxGNT3nfp0I/AAAAAAAAUuE/hTlZm78Mc08/s1600/Screen%252520Shot%2525202012-01-14%252520at%2525209.08.59%252520AM.png>

Requests per second:

<https://lh6.googleusercontent.com/-LoIlwGhvLrA/TxGOnvzGmSI/AAAAAAAAUuc/Sg07YssPK_4/s1600/Screen%252520Shot%2525202012-01-14%252520at%2525209.17.39%252520AM.png>



Milliseconds per request:

<https://lh5.googleusercontent.com/-A76zVs8CCEo/TxGNZ9kcpfI/AAAAAAAAUuQ/w20AuPvgw50/s1600/Screen%252520Shot%2525202012-01-14%252520at%2525209.09.41%252520AM.png>


This suggests that some higher latency handlers were hit (some people were 
editing content), taking up the existing front end instances, after which 
GAE was trying to spin up some dynamic instances to serve other requests. 
 But during warmup, there were DeadelineExceededErrors during file imports, 
suggesting that the dynamic instances aren't being given enough time to 
warmup.

Increasing the idle instances helps.  So perhaps the revised question, at 
least for our particular situation is: why, under load, do the dynamic 
instances timeout during warmup?  That seems to compound the problem as the 
dynamic instances aren't able to serve the requests that are backed up, 
leading to user visible 500 errors, and more attempts to dynamically load 
instances.

Does my theory have any holes?  Is relying on dynamic instances to handle 
spikes without 500 errors unrealistic?  I know the docs state, "A smaller 
number of idle Instances means your application costs less to run, but may 
encounter more startup latency during load spikes." but thrashing on 
DeadlineExceededErrors during warmup seems to indicate that dynamic 
instances can't be relied upon for load spikes at all right now.


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/bYRgRhlKZjoJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to