Stephen O'Donnell created HDDS-4304:
---------------------------------------
Summary: Close Container event can fail if pipeline is removed
Key: HDDS-4304
URL: https://issues.apache.org/jira/browse/HDDS-4304
Project: Hadoop Distributed Data Store
Issue Type: Bug
Components: SCM
Affects Versions: 1.1.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell
If you call `pipelineManager.finalizeAndDestroyPipeline()` with
onTimeout=false, then the finalizePipeline call will result in a closeContainer
event to be fired for every container on the pipeline. These are handled
asynchronously.
However, immediately after that, the `destroyPipeline(...)` call is made. This
will remove the pipeline details from the various maps / stores.
Then the closeContainer events get processed, and they attempt to remove the
container from the pipeline. However as the pipeline has already been
destroyed, this throws an exception and the close container events never get
sent to the DNs:
{code}
2020-10-01 15:44:18,838
[EventQueue-CloseContainerForCloseContainerEventHandler] INFO
container.CloseContainerEventHandler: Close container Event triggered for
container : #2
2020-10-01 15:44:18,842
[EventQueue-CloseContainerForCloseContainerEventHandler] ERROR
container.CloseContainerEventHandler: Failed to close the container #2.
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException:
PipelineID=59e5ae16-f1fe-45ff-9044-dd237b0e91c6 not found
at
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removeContainerFromPipeline(PipelineStateMap.java:372)
at
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removeContainerFromPipeline(PipelineStateManager.java:111)
at
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removeContainerFromPipeline(SCMPipelineManager.java:413)
at
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:352)
at
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:331)
at
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:66)
at
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.Onmessage(CloseContainerEventHandler.java:45)
at
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor
{code}
The simple solution is to catch the exception and ignore it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]