Haiyang Hu created HDFS-17250:
---------------------------------
Summary: EditLogTailer#triggerActiveLogRoll should handle thread
Interrupted
Key: HDFS-17250
URL: https://issues.apache.org/jira/browse/HDFS-17250
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Haiyang Hu
Assignee: Haiyang Hu
*Issue:*
When the NameNode attempts to trigger a log roll and the cachedActiveProxy is a
"shut down NameNode," it is unable to establish a network connection. This
results in a timeout during the socket connection phase,
which has a set timeout of 90 seconds. Since the asynchronous call for
"Triggering log roll" has a waiting time of 60 seconds,
it triggers a timeout and initiates a "cancel" operation, causing the executing
thread to receive an "Interrupted" signal and throwing a
"java.io.InterruptedIOException" exception.
Currently, the logic not to handle interrupted signal, and the
"getActiveNodeProxy" method hasn't reached the maximum retry limit, the overall
execution process doesn't exit and it continues to attempt to
call the "rollEditLog" on the next NameNode in the list. However when a socket
connection is established, it throws a
"java.nio.channels.ClosedByInterruptException" exception due to the thread being
in an "Interrupted" state.
this cycle repeats until it reaches the maximum retry limit (nnCount *
maxRetries) will exits.
However in the next cycle of "Triggering log roll," it continues to traverse
the NameNode list and encounters the same issue and the cachedActiveProxy is
still a "shut down NameNode."
This eventually results in the NameNode being unable to successfully complete
the "Triggering log roll" operation.
To optimize this, we need to handle the thread being interrupted and exit the
execution
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]