[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689791#comment-16689791
 ] 

ASF GitHub Bot commented on DRILL-6039:
---------------------------------------

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r234301751
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java
 ##########
 @@ -433,6 +442,22 @@ public void run() {
     public StatusThread() {
       // assume this thread is created by a non-daemon thread
       setName("WorkManager.StatusThread");
+  }
+  private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
+      final Path path = 
FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR"));
+      try (final WatchService watchService = 
FileSystems.getDefault().newWatchService()) {
+        path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, 
StandardWatchEventKinds.ENTRY_CREATE);
+        while (true) {
+          final WatchKey wk = watchService.take();
 
 Review comment:
   This design is not correct and has issues.
   
   - You are using `WorkManager` `StatusThread` for registering a 
`WatchService` and then doing a blocking call `take()` which will make this 
thread stuck until there is any event seen by `WatchService`. Hence 
`StatusThread` will not be able to perform it's actual job of sending status of 
`runningFragments` to it's `Foreman`.
   - The `pollShutdown` method is calling `Drillbit.close()` which is in 
`StatusThread`. Now `Drillbit.close() `calls close on `WorkManager` and which 
calls `interrupt` on `StatusThread`. There is a cycle here. What it means to 
call interrupt on itself ? Usually interrupt is used by another thread to wake 
up a blocking thread. I think based on interrupt call the interrupt flag of 
status thread will be set. Then when it comes out of `pollShutdown` and calls 
sleep it will hit interrupted exception. It might work but I don't like the 
idea where WorkManager which is part of Drillbit has a thread which calling 
`Drillbit.close()`. 
   
   Please create a separate thread within Drillbit class itself to do this. But 
in `Drillbit::close()` method you need to still `interrupt` this new thread for 
cases when `close` it called from some other path. But you can maintain a flag 
which will indicate if `Thread.interrupt()` should be called or not.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6039
>                 URL: https://issues.apache.org/jira/browse/DRILL-6039
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.3.0
>            Reporter: Krystal
>            Assignee: Venkata Jyothsna Donapati
>            Priority: Major
>             Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to