Guys, it seems that my diagnostic was wrong... Looking through the logs 
I've just seen that the issues are still there, with slow queries and lock 
timeouts.
I've also seen the SSL trace many times: "{communication} unable to perform 
ssl handshake: Connection reset by peer : 104".

Maybe we should start with that SSL trace: in which situations does 
ArangoDB output that log?

Thanks,
Thomas

On Tuesday, April 18, 2017 at 8:41:33 PM UTC+8, Thomas Weiss wrote:
>
> Also if it can help, it happened with 3.1.15 on Ubuntu 16.04
>
> On Tuesday, April 18, 2017 at 7:46:12 PM UTC+8, Frank Celler wrote:
>>
>> Thomas has shared with me a (private) Azure account we can try. Will post 
>> the result here.
>>
>> Am Dienstag, 18. April 2017 13:40:46 UTC+2 schrieb Jan:
>>>
>>> Hi Thomas,
>>>
>>> thanks for the analysis you did! 
>>> That means you are connecting to Azure Table Storage from Foxx via the 
>>> request module and SSL, right? Which SSL protocol are you using to connect 
>>> to it?
>>> And the problem seems to happen (not confirmed) when Azure Table Storage 
>>> has higher response time than usual? 
>>>
>>> And do you happen to remember who answered what and when on Slack 
>>> regardings the TLS support changes? AFAIK we fixed a few bugs in the TLS 
>>> code in 3.1 recently, but I am not aware of any changes that introduced new 
>>> issues there. And TLS support should have been there in 3.0 already. So I 
>>> am wondering if you could provide some more info on this.
>>>
>>> Thanks!
>>> Jan
>>>
>>> Am Montag, 17. April 2017 10:50:21 UTC+2 schrieb Thomas Weiss:
>>>>
>>>> Hi everyone,
>>>>
>>>> I just wanted to share with you my recent experience in troubleshooting 
>>>> strange problems.
>>>>
>>>> Background: This project uses Foxx where most of the app logic is 
>>>> implemented. From Foxx functions, I used the request module to post events 
>>>> to Azure Table Storage.
>>>>
>>>> Everything was really working fine until ~2 weeks ago when I started to 
>>>> notice that my ArangoDB instances would sometimes go through some "apnea" 
>>>> with:
>>>> - requests taking a long time to run (many minutes!)
>>>> - lock timeouts in Foxx transactions
>>>> - general performance degradation with the web dashboard not available
>>>> Those issues would last for 10 to 15 minutes and everything would get 
>>>> back to normal.
>>>>
>>>> I first suspected my code to be at fault and spent a lot of time trying 
>>>> to figure out what triggered those problems. But then I found out that:
>>>> - both staging and production environments were impacted, but they were 
>>>> not running the same version of my app (and the prod was >1 week older)
>>>> - when those apnea happen, I would sometimes get error logs about SSL 
>>>> handshakes
>>>> - (not confirmed) issues in prod and staging would happen approximately 
>>>> at the same time
>>>> - (not confirmed) issues would happen when the Azure Table Storage 
>>>> would have higher response time
>>>>
>>>> I asked on Slack about the SSL handshake thing and someone answered 
>>>> that there was a bug introduced with TLS support (which I guess was 3.1), 
>>>> and then it hit me that I upgraded my instances from 3.0.10 to 3.1.15 not 
>>>> too long ago.
>>>>
>>>> So I decided to change the flow of events within the system (not a 
>>>> small change!) to avoid having Arango use the request module. This was 
>>>> deployed nearly a week ago, and I didn't have any problem since then!
>>>>
>>>> Cheers,
>>>> Thomas
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to