On Thu, Jun 14, 2018 at 11:29:39AM +0200, Janusz Dziemidowicz wrote:
> 2018-06-14 11:14 GMT+02:00 Willy Tarreau <w...@1wt.eu>:
> > Yep it's really not easy and probably once we find it I'll be ashamed
> > saying "I thought this code was not merged"... By the way yesterday I
> > found another suspect but I'm undecided on it ; the current architecture
> > of the H2 mux complicates the code analysis. If you want to give it a
> > try on top of previous one, I'd appreciate it, even if it doesn't change
> > anything. Please find it attached.
> 
> Will try.
> 
> I've found one more clue. I've added various graphs to my monitoring.
> Also I've been segregating various traffic kinds into different
> haproxy backends. Yesterday test shows this:
> https://pasteboard.co/HpPK2Ml6.png
> 
> This backend (sns) is used exclusively for static files that are
> "large" (from 10KB up to over a megabyte) compared to my usual traffic
> (various API calls mostly). Those 5xx errors are not from the backend
> servers, "show stat":
> sns,kr-8,0,0,5,108,,186655,191829829,19744924356,,0,,0,0,0,0,UP,100,1,0,0,0,15377,0,,1,4,1,,186655,,2,7,,147,L7OK,200,1,0,184295,550,0,0,0,,,,,8895,0,,,,,0,OK,,0,12,4,1474826,,,,Layer7
> check passed,,2,5,6,,,,10.7.1.8:81,,http,,,,,,,,
> sns,kr-10,0,0,2,105,,186654,191649821,19977086644,,0,,0,0,0,0,UP,100,1,0,0,0,15377,0,,1,4,2,,186654,,2,7,,148,L7OK,200,0,0,184275,551,0,0,0,,,,,8823,0,,,,,0,OK,,0,21,4,1473385,,,,Layer7
> check passed,,2,5,6,,,,10.7.1.10:81,,http,,,,,,,,
> sns,BACKEND,0,0,8,213,6554,383553,391967657,39722011000,0,0,,0,0,0,0,UP,200,2,0,,0,15377,0,,1,4,0,,373309,,1,14,,332,,,,0,368563,1101,0,1873,12008,,,,383545,27962,0,0,0,0,0,0,,,0,18,5,1763433,,,,,,,,,,,,,,http,roundrobin,,,,,,,

Oh this is very interesting indeed! So haproxy detected 6500 5xx errors
on this backend that were not attributed to any of these servers. I'm
really not seeing many situations where this can happen, I'll have a
look at this in the code. A common case could be 503s emitted when the
requests die in the queue but you have no maxconn thus no queue. Maybe
we return some 500 from time to time, though I'll have to figure out
why!

Thank you!
Willy

Reply via email to