Advice needed for investigating a potential bug

Patrick Zwahlen Tue, 24 Sep 2024 05:48:42 -0700

Hi,

I ran into an issue with an haproxy located between a kibana and an elastic
cluster


Kibana ---- h2/tls ---> haproxy ---- tls ---> 3*elastic

This has been running fine with Elastic Stack version 3.14 and haproxy
3.0.x for a couple weeks.

After an upgrade to Elastic 3.15 (.0 and .1) I cannot login to Kibana
anymore. The few logs I have are telling me that there is a timeout on a
PUT request between Kibana and the Elastic cluster. I don't see such a PUT
in the haproxy logs and I started investigating.

Downgrading Kiibana to 3.14 seems to solve the issue, so something has
changed there.

Pointing Kibana 3.15 directly at an elastic node (bypassing haproxy) solves
the issue as well. Tested both on the master node and a slave. So when it
breaks, it is because of a combination of haproxy 3.0.x/kibana 3.15.

I thought this might be related to haproxy stickiness/loadbalancing.
However, forcing the traffic to a single backend (via haproxy) doesn't
solve the problem. Going down from 3 to 1 backend server doesn't solve the
problem either.

I then tested h2 versus http/1.1 without any success. Forcing http/1.1 on
the frontend doesn't solve the problem.

I then tested changing my haproxy config from http to tcp and it worked
again. So the problem lies within http.

Finally, I tested a downgrade of haproxy: 2.9 fails as well, but 2.8.11
works!

How would one troubleshoot such a problem ? I would like to get debug logs
and compare 2.8.11 (working) versus 3.0.5 (failing) in order to understand
what is going one.

Regards!

Advice needed for investigating a potential bug

Reply via email to