Qifan Chen has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB 
forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the TCLIService protocol between the client and Impala
server and applies to the following Impala clients which issue thrift
RPC ExecuteStatement() followed by repeated call to GetOperationStatus()
(HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ImpalaServer::ExecuteStatementCommon()
starts a new thread for ImpalaServer::ExecuteStatementCommonInternal()
which can reach two states: COMPILED and DONE. The COMPILED is when the
front end has successfully compiles the query and the DONE is for the
execution of the query plan to reach the end successfully or to
encounter any errors. The main thread, which start the new thread,
waits for the COMPILED state before advancing to another short wait
period for the DONE state. If the DONE state is not reached, the
control is returned back to the client and the client will issue
GetOperationStatus() repeatedly to check if the execution has reached
the DONE state. When Impala server detects the FINISHED execution state
or there is error in servicing GetOperationStatus(), the new thread is
joined and released. Thus for a long DDL query, its execution part is
done in the new thread and the Impala client keeps checking its status
via GetOperationStatus() without waiting more than 350s.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

The communication area between the new thread and the host thread
is per session.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/thread.h
7 files changed, 322 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/11
--
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

Reply via email to