Hi guys!

I have a tough situation here and need help from the community.

We're using Haproxy as a load-balancer for pretty complex web application
deployed in AWS EC2.
One of ours http backends is a web-server on Tornado which uses NoSQL
cluster for data storage.
It serves requests on port 80 and Haproxy terminates SSL for incoming
connections. There is main cycle of work for this backend:

1) We got an incoming PATCH request from client to Haproxy.
2) Haproxy redirects it to one of http backend node.
3) Backend node extracts all needed data and put it into NoSQL table.
4) After that backend replies with short response to client. Response
contains confirmation of received data.

Everything worked OK until we've decided to migrate to Amazon's DynamoDB.
All local tests without load balancer passing fine, but if code is deployed
on real instances in EC2 the following happens:

As soon as backend node receives the request from Haproxy and connects to
DynamoDB to put the data, Haproxy immediately replies error 502 to client
(logs shows this happens roughly 100 msec BEFORE actual backend's reply).
Backend successfully continues with processing of data and returns a reply
with code 200 but session is closed already by Haproxy.

Details of error 502 from Haproxy:

client_ip=<redacted> client_port=32668 backend=http_data times=0/0/0/-1/91
http_status=502 bytes_uploaded=478 bytes_read=1807 termination_state=PH--
connections=1/0/0/0/0 queues=0/0 "PATCH /incoming_data HTTP/1.1"

Termination state PH means error happened prematurely on headers processing
so I've checked out the output of 'show errors' socket command and saw the
following:

Total events captured on [11/Oct/2017:07:27:22.352] : 1

[10/Oct/2017:18:34:51.923] backend http-data (#8): invalid response
  frontend http-ssl (#3), server incoming_data_bk1 (#1), event #1
  src <redacted>:24565, session #12, session flags 0x000004ce
  HTTP msg state 26, msg flags 0x00000000, tx flags 0x08000000
  HTTP chunk len 0 bytes, HTTP body len 0 bytes
  buffer flags 0x00008002, out 0 bytes, total 1453 bytes
  pending 1453 bytes, wrapping at 16384, error at position 0:

  00000  \x17\x03\x03\x05\xA8\x83(\x13\xE5Ph\xA7\xE4\x93}\x04&\xA2\x83}e\xAE!
  00023+ \x1E\xA1\x14|B\x15\xEE\x13\xEDx4\x8Bw"\xC0!\xB8\x97\xCE\xBF\x8E\xC5
  00045+ \xC0\x8C\xF2G$\t\xCC\x97\xE8K[5\xB3\x95d\xD5c#l\x9AI\r2@\xD6\xC0<dBI\e
  00076+ \x03/f\xD8a\xA2\xCAB\xA09\xFA0g\xA1I\x14\xFEM\x9F\xAE*x\xB7\x87\xCA

And there is 100+ more lines of binary data.
I've inspected the actual reply from web-server (it's logged before
sending) and it was only 6 bytes with all proper headers.

So that weird data which has been captured by Haproxy is definitely
not the actual response. So I've began debugging and this is what I've
found so far:

1) This issue appears ONLY if backend performs data exchange with DynamoDB.
2) It's not 100% reproducible - roughly one of ten requests succeed.

Then I've started Haproxy in debug mode on foreground to see all info
about all requests and responses on screen.
After I've send the PATCH request Haproxy have stuck for about 40
seconds (literally) but after this pause it continued with proper
response from backend!

I have no idea what Haproxy did in that 40 seconds - there was no
messages on screen about anything. After this 'freeze' the normal
response from backend have appeared.
All headers was in place and Content-Length and Content-Type was OK
when response have transmitted.

So it looks like Haproxy somehow intercepts data exchange between
backend and DynamoDB and treat this data as reply.
Does anyone saw something like this before? Any suggestions on how to
debug/fix that? What might be the cause of that?

I've checked this out with all main releases of Haproxy - 1.5, 1.6,
1.7 - all have the same behavior with that.

Thanks!

Reply via email to