Hello Michael Ho, Pranay Singh, Quanlong Huang, Lars Volker, Fredy Wijaya, Todd Lipcon, Bikramjeet Vig, Dan Hecht, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10744 to look at the new patch set (#15). Change subject: IMPALA-1760: Implement shutdown command ...................................................................... IMPALA-1760: Implement shutdown command This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the quiesce period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is drained (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not drain (e.g. rogue clients, long-running queries), after a longer timeout it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The quiesce period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The quiesce period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0 --- M be/src/runtime/backend-client.h M be/src/runtime/client-cache.h M be/src/runtime/coordinator-backend-state.cc M be/src/scheduling/scheduler.cc M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/testutil/fault-injection-util.h M be/src/util/default-path-handlers.cc M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/StatestoreService.thrift M common/thrift/Types.thrift M common/thrift/generate_error_codes.py M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M tests/common/impala_cluster.py M tests/common/impala_service.py M tests/custom_cluster/test_restart_services.py M tests/hs2/hs2_test_suite.py M tests/hs2/test_fetch_first.py M www/backends.tmpl M www/root.tmpl 36 files changed, 1,088 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/10744/15 -- To view, visit http://gerrit.cloudera.org:8080/10744 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0 Gerrit-Change-Number: 10744 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Fredy Wijaya <fwij...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>