Thanks edga...@google.com,

You've given the first genuinely useful answer! It makes more sense that 
this could be due to the rate of 500 errors in our services. 

However, the service in question only has twenty-eight* 5xx errors in the 
past week that are not due to "The process handling this request 
unexpectedly died."*
In contrast, is has about 200 errors that ARE due to *"The process handling 
this request unexpectedly died."* in the past week.

I'm a bit incredulous that 28 5xx errors resulted in a high enough rate to 
generate all these "process died" errors, so I suspect there is some 
non-ideal behavior of the instance scheduler or a bug on Google's end.

That being said, thank you, because this gives me a lead at least. 

We do have 2 suspicious 500 error requests that resulted in 204 error codes 
in the past week: "A problem was encountered with the process that handled 
this request, causing it to exit.".
So I'll address that issue; it's one place where we're still using a raw 
HttpServlet instead of a proper REST API Framework. 

As I said, I am still finding it very hard to believe that there's not an 
issue on Google's end. We're getting hundreds and hundreds of these 203 
errors, far away in time from the legitimate 500 errors that are due to our 
application code.

Thanks,
Charles

On Friday, August 10, 2018 at 3:46:07 PM UTC-7, edgaral...@google.com wrote:
>
> When an instance returns to many sequential 5xx errors, our instance 
> scheduling system will consider the instance unhealthy. The instance 
> scheduler will then terminate this instance.  
>
> When an instance gets terminated, and it still has a request queued, the 
> queued request will throw the 203 error. While the instance is being 
> terminated, no new requests get queued for that instance by our scheduler. 
>
> This means that the 203 only gets thrown when there's a request queued 
> when the instance gets terminated, and only for the request that was 
> queued. 
>
> The root cause is/should in fact be that our instance scheduler will 
> terminate an instance that serves to many sequential 5xx errors, and this 
> is expected and desired behavior.
>
> The cause of the problem that needs to be addressed is this high incidence 
> of 5xx errors. You could filter the logs by the instance Id and look at the 
> 5xx errors prior to the instance shutdown to verify this claim.
>
>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/6d779836-6578-4524-b551-795848257fe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to