Some more details while searching the logs: The latest burst (we have about one every 1-2 hours) shows this: We had about 90 instances up. We got DeadlineExceeded error during two minutes. During the first 30seconds, 50 successive error requests have no pending_ms. Then during the next 1m30 they nearly all have pending_ms and one third are loading requests.
So my guess is that it's not linked to the scheduler: something goes wrong and requests are killing the instances during the first 30 seconds, then killing the remaining ones with an increased latency until what goes wrong is resolved. On 3 août, 10:20, Alexis <alexis.hanico...@gmail.com> wrote: > We are using Python, and we are not using back-ends or taskqueue. > But most of our requests fetch entities of the same kind, so a "kind- > locking" can be relevant. > I'll setup timeouts on datastore operations to see if this is what is > going wrong (and looks good to setup these anyway). > > Seems logical to have pending_ms values if the app have no instances > available instead of the one hundred that used to serve the traffic... > however requests should be able to perform in less than 20sec. Nearly > all the requests failing with this error, and while the instances get > killed off, have this pending_ms value, and don't have the > loading_request=1. > So I'm not sure whether these DeadlineExceeded errors come first or as > a consequence of the instances being killed: they are not > loading_request and the warning in the logs says it may kill the > instance, but we have these pending_ms showing that we already lack of > instances. > > The traffic is very steady. > > On 3 août, 05:49, Robert Kluin <robert.kl...@gmail.com> wrote: > > > > > > > > > Interesting. I've been seen exactly the same strange behavior across > > several apps as well. Suddenly instances will get killed and > > restarted in large batches. This happens even with low request > > latency, small memory usage (similar to yours < 50mb), low error > > rates, and steady traffic. I pretty convinced this is tied to the > > scheduler changes they've been making over the past few weeks. > > > As a side note, the pending_ms value (9321) indicates that the request > > sat there waiting to be serviced for quite a long time. That won't > > leave as much time to respond to the requests. Do you always see > > bursts of those when your instances get killed off? Are you getting > > big spikes in traffic when this happens or is it steady? > > > Robert > > > On Tue, Aug 2, 2011 at 05:24, Alexis <alexis.hanico...@gmail.com> wrote: > > > Hi, > > > > I've got a similar issue: lots of DeadlineExceeded errors since a few > > > weeks. I'm on the master-slave datastore too, but what I'm reporting > > > happened again one hour ago. > > > > These errors happen in bursts, and I recently realized that it was in > > > fact shutting down ALL instances of the application. > > > (In the logs, I also have this warning: A serious problem was > > > encountered with the process that handled this request, causing it to > > > exit. This is likely to cause a new process to be used for the next > > > request to your application. If you see this message frequently, you > > > may be throwing exceptions during the initialization of your > > > application. (Error code 104)) > > > This does not happen when an instance is spinning up but after several > > > hours. > > > > The trace I get along with the DeadlineExceeded errors show that it > > > happens in the second phase: while the app is trying to fallback > > > gracefully because of an other error (that does not appears in logs). > > > Request reported processing time can be like this: ms=100878 > > > cpu_ms=385 api_cpu_ms=58 cpm_usd=0.010945 pending_ms=9321 > > > > Here is a screenshot of the admin page, showing that all instances > > > have been shut down about 7 minutes ago, even resident ones: > > >http://dl.dropbox.com/u/497622/spinDown.png > > > > The app do work in batches (although not always small ones). But > > > request processing time is usually good enough (see average latency on > > > the screen shot). > > > I'm trying things on my testing applications to see what can be wrong > > > but it's still not clear for me and I'm running short of ideas... > > > > Any suggestions? > > > > On 2 août, 06:21, Robert Kluin <robert.kl...@gmail.com> wrote: > > >> Hi Will, > > >> I assume this is on the master-slave datastore? I think there were > > >> a number of large latency spikes in both the datastore and serving > > >> last week. > > > >> Some things to try: > > >> - do work in smaller batches. > > >> - if you're doing work serially, do it in batches. > > >> - use async interfaces to do work in batches, but in parallel using > > >> async. > > > >> http://code.google.com/appengine/docs/python/datastore/async.html > > > >> Robert > > > >> On Fri, Jul 29, 2011 at 18:35, Will Reiher <wrele...@gmail.com> wrote: > > >> > I'm trying to debug this issue but I keep hitting a wall. > > >> > I keep trying new things on one of my deployments to see if I an get > > >> > the > > >> > number of errors down but nothing seems to help. It all started in the > > >> > last > > >> > week or so ago. I also have some existing deployments that I have not > > >> > changed and are seeing these same errors while the code was never > > >> > changed > > >> > and had been stable. > > >> > 1. This is is happening on existing code that has not changed recently > > >> > 2. The DeadlineExceededErrors are coming up randomly and at different > > >> > points > > >> > in the code. > > >> > 3. Latency is pretty high and app engine seems to be spawning a lot of > > >> > new > > >> > instances beyond my 3 included ones. > > > >> > -- > > >> > You received this message because you are subscribed to the Google > > >> > Groups > > >> > "Google App Engine" group. > > >> > To view this discussion on the web visit > > >> >https://groups.google.com/d/msg/google-appengine/-/g_C4iPzPeo4J. > > >> > To post to this group, send email to google-appengine@googlegroups.com. > > >> > To unsubscribe from this group, send email to > > >> > google-appengine+unsubscr...@googlegroups.com. > > >> > For more options, visit this group at > > >> >http://groups.google.com/group/google-appengine?hl=en. > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "Google App Engine" group. > > > To post to this group, send email to google-appengine@googlegroups.com. > > > To unsubscribe from this group, send email to > > > google-appengine+unsubscr...@googlegroups.com. > > > For more options, visit this group > > > athttp://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.