see inline responses On Thu, Jun 8, 2017 at 4:47 PM, Ralph Little <rlit...@inetco.com> wrote:
> Hi, > I have been looking into an issue where the native client hangs after > failing to handshake during connection. > We have some boundary conditions where a connection is attempted while the > drill service is starting resulting in handshake failures. > > -- > > DrillClient::connect() ends up ultimately in DrillClientImpl::recvHandshake > (). > > This function calls async_read() with a callback to > DrillClientImpl::handleHandshake to handle the results of handshaking. > However, on error, DrillClientImpl::handleHandshake ends up calling > handleConnError() which merely calls shutdownSocket() to kill the socket > and set m_bIsConnected to false. > When all that unfolds, DrillClientImpl::validateHandshake() ends up > returning CONN_SUCCESS which is clearly wrong because the handshake failed. > > The original caller to DrillClient::connect() thinks everything is > hunky-dorey. > Yes, that would be a problem. From what I remember, the recvHandshake call blocks in m_ioservice.run. On return from run, the recvHandshake checks if the error object m_pError is not null. m_pError is not null iff there has been an error. Do you see this not working correctly? > The following added to DrillClientImpl::recvHandshake() after the > m_io_service.run() line to check the result seems to fix the problem: > > if (!m_bIsConnected) > { > return CONN_HANDSHAKE_FAILED; > } > > Certainly if it addresses the problem, then this is a perfectly good fix. > ...but I wonder if there is a better solution. > Also, although there is the IsActive() member that applications can call to > determine if the connection is up, I wonder if the front-line drill calls > (such as submitQuery()) could check the connection status a matter of > course. > AFAIK, the IsActive method is an artifact of some old initial implementation and I'm not sure if it is used much. > Currently, if you attempt a submitQuery() call when the connection is > down, it just hangs because m_io_service is not running and m_deadlineTimer > never triggers as a fall back. > > Opinions? > It is a good idea to check connection status before sending any message to the server. LMK if you want to submit a patch :), I can review and merge it in. > > Cheers, > Ralph Little >