Hello Sailesh Mukil, Todd Lipcon, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/9606 to review the following change. Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() ...................................................................... KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write() Previously, OutboundTransfer::TransferStarted() returns true iff non-zero bytes have been successfully sent via Writev(). As it turns out, this doesn't work well with SSL_write(). When SSL_write() returns -1 with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly the same buffer pointer next time even if 0 bytes have been written. The following sequence becomes problematic with the previous implementation of OutboundTransfer::TransferStarted(): - WriteHandler() calls SendBuffer() on an OutboundTransfer. - SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above. Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0 and OutboundTransfer::TransferStarted() still returns false. - OutboundTransfer is cancelled or timed out. car->call is set to NULL. - WirteHandler() is called again and as it notices that the OutboundTransfer hasn't really started yet and "car->call" is NULL due to cancellation, it removes it from the outbound transfer queue and moves on to the next entry in the queue. - WriteHandler() calls SendBuffer() with the next entry in the queue and eventually calls SSL_write() with a different buffer than expected by SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error. This change fixes the problem above by adding a boolean flag 'started_' which is set to true if OutboundTransfer::SendBuffer() has been called at least once. Also added some tests to exercise cancellation paths with multiple concurrent RPCs. Confirmed the problem above is fixed by running stress test in a 130 node cluster with Impala. The problem happened consistently without the fix. Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Reviewed-on: http://gerrit.cloudera.org:8080/9587 Reviewed-by: Sailesh Mukil <sail...@cloudera.com> Reviewed-by: Todd Lipcon <t...@apache.org> Tested-by: Todd Lipcon <t...@apache.org> --- M be/src/kudu/rpc/rpc-test-base.h M be/src/kudu/rpc/rpc-test.cc M be/src/kudu/rpc/transfer.cc M be/src/kudu/rpc/transfer.h 4 files changed, 84 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/9606/1 -- To view, visit http://gerrit.cloudera.org:8080/9606 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683 Gerrit-Change-Number: 9606 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>