On Thu, May 08, 2014 at 01:46:30PM +0200, Willy Tarreau wrote:
> 
> On Thu, May 08, 2014 at 12:47:06PM +0200, John-Paul Bader wrote:
> > Ok after about roughly another 2 hours the next core dump happened - 
> > this time with the nokqueue option set.
> > 
> > The coredump looks very similar and now crashes with ev_poll instead of 
> > ev_kqueue:
> 
> Indeed it's exactly the same.
> 
> > (gdb) bt full
> > #0  kill_mini_session (s=0x80337f800) at src/session.c:299
> >     level = 6
> >     conn = (struct connection *) 0x0
> >     err_msg = <value optimized out>
> > #1  0x0000000000463928 in conn_session_complete (conn=0x802e29480) at 
> > src/session.c:355
> >     s = (struct session *) 0x80337f800
> > #2  0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at 
> > src/connection.c:88
> >     conn = <value optimized out>
> >     flags = 41997063
> 
> That's 0x0280d307 :
> 
>   - {SOCK,DATA,CURR}_RD_ENA=1       => it's a handshake, waiting for reading
>   - {SOCK,DATA,CURR}_WR_ENA=0       => no need for writing
>   - CTRL_READY=1                    => FD is still allocated
>   - XPRT_READY=1                    => transport layer is initialized
>   - ADDR_FROM_SET=1, ADDR_TO_SET=0  => clearly it's a frontend connection
>   - INIT_DATA=1, WAKE_DATA=1        => processing a handshake (ssl I guess)
>   - {DATA,SOCK}_{RD,WR}_SH=0        => no shutdown
>   - ERROR=0, CONNECTED=0            => handshake not completed yet
>   - WAIT_L4_CONN=0                  => normal
>   - WAIT_L6_CONN=1                  => waiting for an L6 handshake to complete
>   - SSL_WAIT_HS=1                   => the pending handshake is an SSL 
> handshake
> 
> So at least what we know is that it receives an SSL handshake, and that
> it dies while processing it and before any error is reported on the
> connection. I'll recheck the error path there in case I find anything.

In fact it's a bit more precise, flags is the copy of connection->flags when
entering the function. It indicates a handshake is in progress. And the only
way to reach line 88 is for the handshake to complete without error. So we
know for sure that ssl_sock_handshake() was called and completed the
handshake then removed the CO_FL_SSL_WAIT_HS flag from the connection.

And that becomes interesting, because ssl_sock_handshake() is cut in several
parts. The first one handles renegociations. We could not enter it because
CONNECTED=0. Then SSL_do_handshake() is called. If it does not return exactly
1, it enters the second block which always returns without clearing the
handshake flag. So we also know we did not enter this place. Thus the
handshake succeeded and went to the reneg_ok label. The third block here
is dedicated to server-side, which is not the case here. And we leave...

So what causes the trouble is done within SSL_do_handshake(), which must
be one of our callbacks. Thus that could still be related to the session
cache in fact. Maybe a corner case remains. I'll check that.

Regards,
Willy


Reply via email to