[jira] [Commented] (DRILL-4286) Have an ability to put server in quiescent mode of operation

Paul Rogers (JIRA) Sun, 27 Aug 2017 14:29:25 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143227#comment-16143227
 ]


Paul Rogers commented on DRILL-4286:
------------------------------------

There may be just a bit of confusion about the purpose of this feature. Drill 
already provide the means to take down a Drillbit quickly. Just kill the 
process. (Drills {{drillbit.sh}} script sends a {{SIGTERM}}, waits a while, 
then sends a {{SIGKILL}}.) So, if fast exit is the goal, we already have that.

The problem, of course, is that such a fast exit causes all in-flight queries 
to die. Why? Drill is fully symmetric: all queries launch fragments on all 
nodes. Drill is an "in-memory" DAG model meaning data flows directly from one 
fragment (node) to another with no persistence between fragments (stages.) As a 
result, Drill cannot restart a failed fragment: there is no way to identify 
which data has to be discarded and reread. The only choice is to restart the 
entire query.

Drill is designed to assume that end users can retry (short) queries when nodes 
fail. Not elegant, but not entirely crazy. (I'm sure the end user does not 
consider this an acceptable solution, however.)

When running longer queries, taking down a node causes all progress to be lost. 
Say a query has run for an hour. Taking a node offline loses that work.

The graceful shutdown feature avoids the above problems. The "victim" drillbit 
stays up as long as needed to complete in-flight queries. Now, in the worst 
case, the victim might never shut down because new queries keep arriving. To 
avoid that, the change causes all Forman nodes to stop sending fragments to the 
quiescent node. So, eventually the "victim" node drains and shuts down. All 
with no disruption to the end users running queries on Drill.

Now, if you get tired of waiting for a long-running query to complete, then you 
can still kill the "victim" drillbit, which will kill the remaining, undrained 
queries.

In short, the graceful shutdown is a pretty good compromise to assist both 
users and admins given the way Drill works today. We can certainly imagine ways 
to improve Drill (such as finding a way to restart individual fragments, or 
automatic retry of failed queries), but that requires much more work and is 
saved for a later effort.

All that said, within the confines of this change, all improvement suggestions 
are welcome. In particular, we don't run a production Drill shop, so we'd love 
to hear from those users that do: how might this feature be improved to work 
better in a production environment?

> Have an ability to put server in quiescent mode of operation
> ------------------------------------------------------------
>
>                 Key: DRILL-4286
>                 URL: https://issues.apache.org/jira/browse/DRILL-4286
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Execution - Flow
>            Reporter: Victoria Markman
>            Assignee: Venkata Jyothsna Donapati
>
> I think drill will benefit from mode of operation that is called "quiescent" 
> in some databases. 
> From IBM Informix server documentation:
> {code}
> Change gracefully from online to quiescent mode
> Take the database server gracefully from online mode to quiescent mode to 
> restrict access to the database server without interrupting current 
> processing. After you perform this task, the database server sets a flag that 
> prevents new sessions from gaining access to the database server. The current 
> sessions are allowed to finish processing. After you initiate the mode 
> change, it cannot be canceled. During the mode change from online to 
> quiescent, the database server is considered to be in Shutdown mode.
> {code}
> This is different from shutdown, when processes are terminated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4286) Have an ability to put server in quiescent mode of operation

Reply via email to