Qifan Chen has uploaded a new patch set (#34). ( http://gerrit.cloudera.org:8080/17872 )
Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever ...................................................................... IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever This patch addresses Impala client hang due to AWS network load balancer timeout which is fixed at 350s. When some long DDL operations are executing and the timeout happens, AWS silently drops the connection and the Impala client enters the hang state. The fix maintains the current TCLIService protocol between the client and Impala server and is applicable to the following Impala clients which issue thrift RPC ExecuteStatement() followed by repeated call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend. 1. HS2 2. Beeswax 3. Impyla 4. HUE In the fix, the backend method ClientRequestState::ExecDdlRequest() can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl() which executes most of the DDLs asynchronously. This thread is waited for in the wait thread 'wait_thread_'. Since the wait thread also runs asynchronously, the execution of the DDLs will not cause a wait on the Impala client. Thus the Impala client can keep checking its execution status via GetOperationStatus() without long waiting, say more than 350s. As an optimization, the above asynchronous mode is not applied to the execution of certain DDLs that run very low risks of long execution. 1. Operations that do not access catalog service; 2. COMPUTE STATS as the stats computation queries already run asynchronously. External behavior changes: 1. A new field with name "DDL execution mode:" is added to the summary section in the runtime profile, next to "DDL Type". This field takes either 'asynchronous' or 'synchronous' as value. 2. A new query option 'enable_async_ddl_execution', default to true, is added. It can be set to false to turn off the patch. Limitations: This patch does not handle potential AWS NLB-type time out for LOAD DATA (IMPALA-10967). Testing: 1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and JDBC clients. 2. Ran core tests successfully. Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test M tests/common/impala_test_suite.py M tests/metadata/test_ddl.py 9 files changed, 386 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/34 -- To view, visit http://gerrit.cloudera.org:8080/17872 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e Gerrit-Change-Number: 17872 Gerrit-PatchSet: 34 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>