pagrawal10 commented on issue #15944:
URL: https://github.com/apache/druid/issues/15944#issuecomment-1976847918

   Hey Amatya,
   I debugged the logs further and this is the flow of events:
   Handoff started for A2 task at 15:23. The task was waiting for handoff to be 
complete.
   At 15:26:19 , A1 completed and gave the stop signal to A2. When A2 received 
the stop signal, it started shutting down immediately and dropped the segment 
which it was handling. The segment was dropped successfully. The task could not 
have been waiting for a segment to pick up the segment as the segment was 
already loaded to historicals by the replica task which had completed.
   At 15:26:23 , discoverTasks() function ran and put A2 in a new Pending 
Completion task group.
   The task never completed shutting down and was stuck somewhere till the 
timeout elapsed. I see no logs for the task coming between 15:27 and 15:56 
except stating its current offset.
   At 15:56, we see this exception:
   java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.druid.java.util.common.RE: Current 
thread is interrupted after [0] tries
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at 
org.apache.druid.indexing.overlord.ThreadingTaskRunner$2.call(ThreadingTaskRunner.java:323)
        at 
org.apache.druid.indexing.overlord.ThreadingTaskRunner$2.call(ThreadingTaskRunner.java:315)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.druid.java.util.common.RE: Current thread is interrupted after [0] 
tries
        at 
org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:232)
        at 
org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:152)
        ... 4 more
   Caused by: java.lang.RuntimeException: org.apache.druid.java.util.common.RE: 
Current thread is interrupted after [0] tries
        at 
org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:156)
        at 
org.apache.druid.storage.s3.S3TaskLogs.pushTaskReports(S3TaskLogs.java:141)
        at 
org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:223)
        ... 5 more
   Caused by: org.apache.druid.java.util.common.RE: Current thread is 
interrupted after [0] tries
        at 
org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:148)
        at 
org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81)
        at 
org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163)
        at 
org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153)
        at 
org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:101)
        at 
org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:147)
        ... 7 more
   I have gone through the code but could not pinpoint where the task thread 
was stuck or the exception was swallowed. Can you please take a look?
   
   It seems like the discoveryTasks() interfered with the graceful shutdown 
process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to