Some more details while searching the logs:

The latest burst (we have about one every 1-2 hours) shows this:
We had about 90 instances up.
We got DeadlineExceeded error during two minutes. During the first
30seconds, 50 successive error requests have no pending_ms. Then
during the next 1m30 they nearly all have pending_ms and one third are
loading requests.

So my guess is that it's not linked to the scheduler:
something goes wrong and requests are killing the instances during the
first 30 seconds, then killing the remaining ones with an increased
latency until what goes wrong is resolved.


On 3 août, 10:20, Alexis <alexis.hanico...@gmail.com> wrote:
> We are using Python, and we are not using back-ends or taskqueue.
> But most of our requests fetch entities of the same kind, so a "kind-
> locking" can be relevant.
> I'll setup timeouts on datastore operations to see if this is what is
> going wrong (and looks good to setup these anyway).
>
> Seems logical to have pending_ms values if the app have no instances
> available instead of the one hundred that used to serve the traffic...
> however requests should be able to perform in less than 20sec. Nearly
> all the requests failing with this error, and while the instances get
> killed off, have this pending_ms value, and don't have the
> loading_request=1.
> So I'm not sure whether these DeadlineExceeded errors come first or as
> a consequence of the instances being killed: they are not
> loading_request and the warning in the logs says it may kill the
> instance, but we have these pending_ms showing that we already lack of
> instances.
>
> The traffic is very steady.
>
> On 3 août, 05:49, Robert Kluin <robert.kl...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Interesting.  I've been seen exactly the same strange behavior across
> > several apps as well.  Suddenly instances will get killed and
> > restarted in large batches.  This happens even with low request
> > latency, small memory usage (similar to yours < 50mb), low error
> > rates, and steady traffic.  I pretty convinced this is tied to the
> > scheduler changes they've been making over the past few weeks.
>
> > As a side note, the pending_ms value (9321) indicates that the request
> > sat there waiting to be serviced for quite a long time.  That won't
> > leave as much time to respond to the requests.  Do you always see
> > bursts of those when your instances get killed off?  Are you getting
> > big spikes in traffic when this happens or is it steady?
>
> > Robert
>
> > On Tue, Aug 2, 2011 at 05:24, Alexis <alexis.hanico...@gmail.com> wrote:
> > > Hi,
>
> > > I've got a similar issue: lots of DeadlineExceeded errors since a few
> > > weeks. I'm on the master-slave datastore too, but what I'm reporting
> > > happened again one hour ago.
>
> > > These errors happen in bursts, and I recently realized that it was in
> > > fact shutting down ALL instances of the application.
> > > (In the logs, I also have this warning: A serious problem was
> > > encountered with the process that handled this request, causing it to
> > > exit. This is likely to cause a new process to be used for the next
> > > request to your application. If you see this message frequently, you
> > > may be throwing exceptions during the initialization of your
> > > application. (Error code 104))
> > > This does not happen when an instance is spinning up but after several
> > > hours.
>
> > > The trace I get along with the DeadlineExceeded errors show that it
> > > happens in the second phase: while the app is trying to fallback
> > > gracefully because of an other error (that does not appears in logs).
> > > Request reported processing time can be like this: ms=100878
> > > cpu_ms=385 api_cpu_ms=58 cpm_usd=0.010945 pending_ms=9321
>
> > > Here is a screenshot of the admin page, showing that all instances
> > > have been shut down about 7 minutes ago, even resident ones:
> > >http://dl.dropbox.com/u/497622/spinDown.png
>
> > > The app do work in batches (although not always small ones). But
> > > request processing time is usually good enough (see average latency on
> > > the screen shot).
> > > I'm trying things on my testing applications to see what can be wrong
> > > but it's still not clear for me and I'm running short of ideas...
>
> > > Any suggestions?
>
> > > On 2 août, 06:21, Robert Kluin <robert.kl...@gmail.com> wrote:
> > >> Hi Will,
> > >>   I assume this is on the master-slave datastore?  I think there were
> > >> a number of large latency spikes in both the datastore and serving
> > >> last week.
>
> > >>   Some things to try:
> > >>     - do work in smaller batches.
> > >>     - if you're doing work serially, do it in batches.
> > >>     - use async interfaces to do work in batches, but in parallel using 
> > >> async.
>
> > >>      http://code.google.com/appengine/docs/python/datastore/async.html
>
> > >> Robert
>
> > >> On Fri, Jul 29, 2011 at 18:35, Will Reiher <wrele...@gmail.com> wrote:
> > >> > I'm trying to debug this issue but I keep hitting a wall.
> > >> > I keep trying new things on one of my deployments to see if I an get 
> > >> > the
> > >> > number of errors down but nothing seems to help. It all started in the 
> > >> > last
> > >> > week or so ago. I  also have some existing deployments that I have not
> > >> > changed and are seeing these same errors while the code was never 
> > >> > changed
> > >> > and had been stable.
> > >> > 1. This is is happening on existing code that has not changed recently
> > >> > 2. The DeadlineExceededErrors are coming up randomly and at different 
> > >> > points
> > >> > in the code.
> > >> > 3. Latency is pretty high and app engine seems to be spawning a lot of 
> > >> > new
> > >> > instances beyond my 3 included ones.
>
> > >> > --
> > >> > You received this message because you are subscribed to the Google 
> > >> > Groups
> > >> > "Google App Engine" group.
> > >> > To view this discussion on the web visit
> > >> >https://groups.google.com/d/msg/google-appengine/-/g_C4iPzPeo4J.
> > >> > To post to this group, send email to google-appengine@googlegroups.com.
> > >> > To unsubscribe from this group, send email to
> > >> > google-appengine+unsubscr...@googlegroups.com.
> > >> > For more options, visit this group at
> > >> >http://groups.google.com/group/google-appengine?hl=en.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups 
> > > "Google App Engine" group.
> > > To post to this group, send email to google-appengine@googlegroups.com.
> > > To unsubscribe from this group, send email to 
> > > google-appengine+unsubscr...@googlegroups.com.
> > > For more options, visit this group 
> > > athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to