Hi Dormando,

On Fri, May 01, 2015 at 12:52:55PM -0700, dormando wrote:
> Hi,
> 
> We're running a haproxy as a TLS unwrapping daemon (socket to socket) and
> are running into some cases where processes will spin at 100% CPU for 5-30
> seconds.
> 
> It looks related to half-closed or resetting TCP connections out to end
> users, and always self-recovers after some amount of time.
> 
> The symptoms differ slightly:
> 
> Usually it looks like this:
> epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) =
> 1
> read(8881, 0x870ff53, 5)                = -1 EAGAIN (Resource temporarily
> unavailable)
> epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) =
> 1
> read(8881, 0x870ff53, 5)                = -1 EAGAIN (Resource temporarily
> unavailable)

(...)

That's bad, as you can see you're facing a kernel bug. The kernel tells
haproxy that it can read from this fd, and when it tries it cannot. What
kernel is this ?

> EPOLLIN is set, EPOLLHUP is set, and EPOLLRDHUP (0x2000, which is also
> mapped to POLL_HUP internally). The read() always fails as EAGAIN and it
> drops immediately back into the epoll loop.

Yes since it's exactly what EAGAIN means : "please poll before doing this".

> Occasionally the syscalls are recvfrom() instead of read().

Not surprized, read() is used by openssl while recvfrom() is used by
haproxy, but internally in the kernel, both are mapped to the same code
path.

> Occasionally the EPOLLERR flag is *also* set, yet it still loops. So the
> connection has a fatal problem.

That's interesting because it means it can also return EAGAIN on faulty
connections! I'm just wondering whether this could be caused by memory
corruption : if the fd number is corrupted in the epoll list, a wrong
fd is returned and for sure, trying to access it results in a failure.
That's just a guess of course.

> Occasionally it'll call epoll_wait() in a tight loop, with a combination of
> the above options, but never make a syscall.

That's interesting as well, it fuels the theory of the corrupted fd number.
What you see means that the fd in question is closed, so haproxy cannot do
anything on it.

> ...and I've seen it with just EPOLLIN|RDHUP and no HUP.

Yes, you can regularly see a combination of them. Have you ever seen it
without HUP nor RDUP ? It's interesting to see that it seems to only
happen on closes.

> The src/ssl_sock.c:ssl_sock_to_buf func appears to be most of the problem.
> Compared to the raw socket function, nothing is checking for POLL_ERR.

This is normal. The presence of POLL_ERR doesn't preclude your ability to
read pending queued data on the socket. For example you can face a write
error and still have to read the end of the response. A lot of agents do
this mistake. We did it as well it a long time ago, and once in a while,
people were complaining about truncated responses.

> However I'm not confident I know where to place the HUP detection, or how
> it's looping with epoll_wait() without doing any syscalls.

For me there's nothing you can do on the FD themselves, especially the
ones that are already closed. If you want, you can try to perform a close
when you se POLL_ERR, and then I bet you that you'll see epoll_wait()
return the same fd again and loop without any syscall.

> OpenSSL internally buffers partial packets, but if the remote end is HUP'ed
> there's probably no way you'll get the rest of the read.
> 
> I'm also not really sure how the sockets can end up in this state without
> haproxy immediately closing them (IN + HUP + EAGAIN -> socket was closed,
> right?).

No, EAGAIN is not possible if the socket is closed. That's why I can say
for sure that it returns a pointer to the wrong socket and you're facing
a perfectly valid one without anything to read from.

> It's an obvious enough weirdness that I feel like I'm reading this
> wrong.

Welcome to the world of epoll :-)
It's fast, efficient, powerful, tricky and ugly at the same time!

> It does also eventually wiggle into a state where haproxy closes the
> conn completely, which is why it doesn't spin forever.

In fact in the past we implemented a workaround for an epoll bug where
the close was not always implicit despite being documented, so there's
an epoll_ctl(del) which is done prior to closing, which might lead to
epoll stopping bugging you with that fd.

> It's not clear if
> that's from a timeout or a state change in the socket.
> 
> Anything else I can dig up on this? We have some small modifications to
> haproxy, but largely in unrelated areas of the code. I can share the
> changes privately with someone if necessary. Haven't figured out a test
> case yet, hoping the description makes the issue obvious to the authors.

Feel free to share your changes on the mailing list. Preferably using one
mail per patch to ease reviews.

For the rest, could you please share your kernel version and your config
just in case something rings a bell ?

Regards,
Willy


Reply via email to