Re: [google-appengine] Re: 502 Bad gateway error

Tomas Erlingsson Tue, 15 Aug 2017 10:06:09 -0700

Did this get resolved?  We have an flex java app running in development 
with almost no traffic. We are constantly getting 502 telling us to try in 
30sec and our server app is rebooted many times a day. I am running this 
locally without any problems. Not seeing any errors in the log.


On Friday, 10 February 2017 06:07:26 UTC, Vinay Chitlangia wrote:
>
>
>
> On Thu, Feb 9, 2017 at 7:52 PM, 'Nicholas (Google Cloud Support)' via 
> Google App Engine <google-a...@googlegroups.com <javascript:>> wrote:
>
>> I realize that we've already begun investigating this here but I think 
>> this would be most appropriate for the App Engine public issue tracker.  
>> The issue is leading to an increasingly specific situation and I suspect 
>> will require some exchange of code/project to reproduce the behavior you've 
>> described.  We monitor that issue tracker closely.
>>
>> When filing a new issue on the tracker, please link back to this thread 
>> for context while posting a link to the issue here so that others in the 
>> community can see the whole picture.
>>
>>    - Be sure to include the latest logs for related to the *502*s.  When 
>>    viewing the logs in Stackdriver Logging for instance, include *All 
>>    logs* rather than just *request_log* as *nginx.error*, *stderr*, 
>>    *stdout* and *vm.** logs may reveal clues as to a root cause.
>>    - Mention if your are using any middleware like servlet filters that 
>>    may receive request before that actual handler
>>    - Lastly, include what the CPU and/or memory usage looks like on the 
>>    instance(s) at the time of the 502s.  Screenshots of *Utilization *and 
>>    *Memory Usage* graphs from the Developers Console will likely be 
>>    sufficient
>>    
>> I look forward to this issue report.
>>
> https://code.google.com/p/googleappengine/issues/detail?id=13543
> The logs are "All logs" around the time of the incident, however as a 
> copy/paste from the browser. Couldnt retrieve any logs using gcloud beta 
> logging read. This is the command I tried:
> gcloud beta logging read 'timestamp >= "2017-02-11T03:00:00Z" AND 
> timestamp <="2017-02-12T03:05:00Z"' 
>
>>
>> On Wednesday, February 8, 2017 at 1:24:01 PM UTC-5, Vinay Chitlangia 
>> wrote:
>>>
>>>
>>>
>>> On Wed, Feb 8, 2017 at 10:29 PM, 'Nicholas (Google Cloud Support)' via 
>>> Google App Engine <google-a...@googlegroups.com <javascript:>> wrote:
>>>
>>>> Hey Vinay Chitlangia,
>>>>
>>>> Thanks for some preliminary troubleshooting and linking this 
>>>> interesting article.  App Engine runs Nginx processes to handle routes to 
>>>> your application's handlers.  Handlers serving static assets for instance 
>>>> are handled by this Nginx process and the resources are served directly, 
>>>> thus bypassing the application altogether to save on precious application 
>>>> resources.
>>>>
>>>> The Nginx process will often serve a *502* if the application raises 
>>>> an exception, an internal API call raises an exception or if the request 
>>>> simply takes too long.  As such, the status code by itself does not tell 
>>>> us 
>>>> much.
>>>>
>>>> Looking at the GAE logs for your application, I found the *502*s you 
>>>> mentioned.  One thing I noticed is that they all occur from the */read* 
>>>> endpoint.  From the naming, I assume this endpoint is reading some data 
>>>> from BigTable.  Investigating further, perhaps you could provide some 
>>>> additional information:
>>>>
>>>>    - What exactly is happening at the */read* endpoint?  A code sample 
>>>>    would be ideal if that's not too sensitive.
>>>>    
>>>> As you surmised, we are reading some data from bigtable in this 
>>> endpoint.
>>>
>>>>
>>>>    - What kind of error handling exists in said endpoint if the 
>>>>    BigTable API returns non-success responses?
>>>>    
>>>> The entire endpoint is in a try catch block catching Exception. In the 
>>> case of failure the exception stack trace gets written to the logs.
>>> The first line of the endpoint is a log message signalling receiveing 
>>> the request (this was done for this debugging of course!!) 
>>> For the successful request the log message (the introductory one) gets 
>>> written. For the 502 ones never.
>>> For requests that fail because of bigtable related errors, the logs have 
>>> the stacktrace but not for 502s.
>>> The 502 failure requests finish in <10ms.
>>>
>>>>
>>>>    - 
>>>>    - Can you log various steps in the */read* endpoint?  This might 
>>>>    help identify the progress the request reaches before the *502* is 
>>>>    served.  It would also help in confirming that your application is 
>>>> actually 
>>>>    even getting the request as I can't currently confirm that from the 
>>>> logs.
>>>>    
>>>> My best guess is that the request does not make it to the servlet. The 
>>> reason for that is that for the 100s of failed 502 logs that I have seen, 
>>> not one has the log message, which is the absolute first line in the code 
>>> of the read handler. 
>>>
>>>>
>>>>    - 
>>>>    - If said endpoint does in fact read from BigTable, what API and 
>>>>    java library are you using?
>>>>
>>>> we are using the google provided bigtable hbase1.2 jars version 0.9.4. 
>>>
>>>> Regarding the article you linked, while the configuration of an HTTPS 
>>>> load balancer and nginx.conf can be very important, both the load 
>>>> balancing 
>>>> component and nginx.conf are out of the hands of the developer with App 
>>>> Engine.  Your scaling settings, health check settings and handlers in the 
>>>> app.yaml are the only rules over which you have control that affect load 
>>>> balancing and nginx rules.
>>>>
>>>> On Wednesday, February 8, 2017 at 11:27:43 AM UTC-5, Vinay Chitlangia 
>>>> wrote:
>>>>>
>>>>> Might be related:
>>>>>
>>>>> https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.6k2laoada
>>>>>
>>>>> The symptoms mentioned in this blog
>>>>> Somewhat moderate requests
>>>>> No logs
>>>>>
>>>>> match our observations.
>>>>>
>>>>> I do not see the 
>>>>> "backend_connection_closed_before_data_sent_to_client" status in the logs.
>>>>>
>>>>> The error message for a failed request received by the client is:
>>>>> 11:12:44.549com.yawasa.server.storage.RpcStorageService LogError: 
>>>>> <html><head><title>502 Bad Gateway</title></head><body 
>>>>> bgcolor="white"><center><h1>502 Bad 
>>>>> Gateway</h1></center><hr><center>nginx</center></body></html> (
>>>>> RpcStorageService.java:137 
>>>>> <https://console.cloud.google.com/debug/fromlog?appModule=default&appVersion=1&file=RpcStorageService.java&line=137&logInsertId=589569d9000e7bf6825479e4&logNanos=1486186963359794000&nestedLogIndex=0&project=village-test>
>>>>> )
>>>>>
>>>>> The mention of nginx in the log message appears promising. We are not 
>>>>> using nginx deliberately, so I am assuming this is something happening 
>>>>> under the hood.
>>>>>
>>>>> On Tuesday, February 7, 2017 at 11:08:55 AM UTC+5:30, Vinay Chitlangia 
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>> We are seeing intermittent occurrences of 502 Bad Gateway error in 
>>>>>> our server.
>>>>>> About 0.5% requests fail with this error.
>>>>>>
>>>>>> Out setup is:
>>>>>> Flex running jetty9-compat
>>>>>> F1 machine
>>>>>> 1 server
>>>>>>
>>>>>> Our request pattern is bursty. So the server gets ~30 requests in 
>>>>>> parallel. 
>>>>>> The failures, when they happen are clustered, that is over a period 
>>>>>> of 10'ish seconds one would see 3-4 errors.
>>>>>>
>>>>>> The requests which complete successfully, finish in 50-100 ms, so it 
>>>>>> does not appear like the server is under major load and not able to keep 
>>>>>> up.
>>>>>> To rule out this possibility, I started the servers with 5 replicas. 
>>>>>> However the failure percentage did not change.
>>>>>>
>>>>>> From the looks of it, it appears that there is some throttling or 
>>>>>> quota issue at play. I tried tweaking max-concurrent-requests param. Set 
>>>>>> it 
>>>>>> to 300, but that did not make any difference either.
>>>>>>
>>>>>> I do not see new instances being created at the time of failure 
>>>>>> either.
>>>>>>
>>>>>>
>>>>>> The request log for the failed request:
>>>>>> 09:57:30.686POST502262 B4 msAppEngine-Google; (+
>>>>>> http://code.google.com/appengine; appid: s~village-test)/read
>>>>>> 107.178.194.3 - - [07/Feb/2017:09:57:30 +0530] "POST /read HTTP/1.1" 
>>>>>> 502 262 - "AppEngine-Google; (+http://code.google.com/appengine; 
>>>>>> ms=4 cpu_ms=0 cpm_usd=2.9279999999999998e-8 loading_request=0 instance=- 
>>>>>> app_engine_release=1.9.48 trace_id=-
>>>>>> {
>>>>>> protoPayload: {…}  
>>>>>> insertId: "58994cb30002335cb47fd364"  
>>>>>> httpRequest: {…}  
>>>>>> resource: {…}  
>>>>>> timestamp: "2017-02-07T04:27:30.686052Z"  
>>>>>> labels: {…}  
>>>>>>
>>>>>> operation: {…}  
>>>>>> }
>>>>>>
>>>>>> Looking around at other logs at around the time of failure I see. 
>>>>>> 09:57:30.000[error] 32#32: *35107 recv() failed (104: Connection 
>>>>>> reset by peer) while reading response header from upstream, client: 
>>>>>> 169.254.160.2, server: , request: "POST /read HTTP/1.1", upstream: "
>>>>>> http://172.17.0.4:8080/read";, host: "bigtable-dev.appspot.com"
>>>>>> AFAICT this request never made it to our servlet.
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "Google App Engine" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/google-appengine/zHSuoxkmqjw/unsubscribe
>>>> .
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> google-appengi...@googlegroups.com <javascript:>.
>>>> To post to this group, send email to google-a...@googlegroups.com 
>>>> <javascript:>.
>>>> Visit this group at https://groups.google.com/group/google-appengine.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/google-appengine/ea48946b-fbd9-47af-a7b4-136493f0d583%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/google-appengine/ea48946b-fbd9-47af-a7b4-136493f0d583%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Google App Engine" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/google-appengine/zHSuoxkmqjw/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> google-appengi...@googlegroups.com <javascript:>.
>> To post to this group, send email to google-a...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/google-appengine.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/google-appengine/e2f0a495-82e1-4b03-b1b3-1d8355de7630%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/google-appengine/e2f0a495-82e1-4b03-b1b3-1d8355de7630%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/f946964f-cf30-4426-ac39-67b59684ec09%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [google-appengine] Re: 502 Bad gateway error

Reply via email to