Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9606 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
......................................................................

KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write()

Previously, OutboundTransfer::TransferStarted() returns true iff
non-zero bytes have been successfully sent via Writev(). As it turns
out, this doesn't work well with SSL_write(). When SSL_write() returns -1
with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly
the same buffer pointer next time even if 0 bytes have been written.

The following sequence becomes problematic with the previous implementation
of OutboundTransfer::TransferStarted():

- WriteHandler() calls SendBuffer() on an OutboundTransfer.
- SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above.
  Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0
  and OutboundTransfer::TransferStarted() still returns false.
- OutboundTransfer is cancelled or timed out. car->call is set to NULL.
- WirteHandler() is called again and as it notices that the OutboundTransfer
  hasn't really started yet and "car->call" is NULL due to cancellation, it
  removes it from the outbound transfer queue and moves on to the next entry
  in the queue.
- WriteHandler() calls SendBuffer() with the next entry in the queue and
  eventually calls SSL_write() with a different buffer than expected by
  SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error.

This change fixes the problem above by adding a boolean flag 'started_'
which is set to true if OutboundTransfer::SendBuffer() has been called
at least once. Also added some tests to exercise cancellation paths with
multiple concurrent RPCs.

Confirmed the problem above is fixed by running stress test in a 130 node
cluster with Impala. The problem happened consistently without the fix.

Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Reviewed-on: http://gerrit.cloudera.org:8080/9587
Reviewed-by: Sailesh Mukil <sail...@cloudera.com>
Reviewed-by: Todd Lipcon <t...@apache.org>
Tested-by: Todd Lipcon <t...@apache.org>
Reviewed-on: http://gerrit.cloudera.org:8080/9606
Tested-by: Impala Public Jenkins
---
M be/src/kudu/rpc/rpc-test-base.h
M be/src/kudu/rpc/rpc-test.cc
M be/src/kudu/rpc/transfer.cc
M be/src/kudu/rpc/transfer.h
4 files changed, 84 insertions(+), 1 deletion(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9606
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9606
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to