David Manning created HBASE-28663:
-------------------------------------
Summary: CanaryTool continues executing and scanning after timeout
Key: HBASE-28663
URL: https://issues.apache.org/jira/browse/HBASE-28663
Project: HBase
Issue Type: Bug
Components: canary
Affects Versions: 2.0.0, 3.0.0
Reporter: David Manning
Assignee: David Manning
If you run theĀ {{CanaryTool}} in region mode until it reaches the configured
timeout, the logs and sink results will show that it can continue executing and
scanning for 10 seconds.
This is because the RegionTasks have already been submitted to an
ExecutorService which continues execution after timeout, and the Monitor
continues execution on a separate thread.
The 10 seconds is seen in hbase 2.x, at least, because {{runMonitor}} will
close the {{Connection}} and that process
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
will lead to {{ConnectionImplementation#close}}
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
and inside {{shutdownPools}} we will potentially wait the full 10 seconds of
{{awaitTermination}} if client operations are in progress.
The scenario can be improved by simply interrupting the monitor thread, as we
will often be in an {{invokeAll}} call in a {{sniff}} method, which will
interrupt the client threads and generally shutdown properly and timely.
However, we could be more robust by also watching for a shutdown signal in the
various tasks such as {{{}RegionTask{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)