Stephen O'Donnell created HDDS-4304:
---------------------------------------

             Summary: Close Container event can fail if pipeline is removed
                 Key: HDDS-4304
                 URL: https://issues.apache.org/jira/browse/HDDS-4304
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: SCM
    Affects Versions: 1.1.0
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


If you call `pipelineManager.finalizeAndDestroyPipeline()` with 
onTimeout=false, then the finalizePipeline call will result in a closeContainer 
event to be fired for every container on the pipeline. These are handled 
asynchronously.

However, immediately after that, the `destroyPipeline(...)` call is made. This 
will remove the pipeline details from the various maps / stores.

Then the closeContainer events get processed, and they attempt to remove the 
container from the pipeline. However as the pipeline has already been 
destroyed, this throws an exception and the close container events never get 
sent to the DNs:

{code}
2020-10-01 15:44:18,838 
[EventQueue-CloseContainerForCloseContainerEventHandler] INFO 
container.CloseContainerEventHandler: Close container Event triggered for 
container : #2
2020-10-01 15:44:18,842 
[EventQueue-CloseContainerForCloseContainerEventHandler] ERROR 
container.CloseContainerEventHandler: Failed to close the container #2.
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=59e5ae16-f1fe-45ff-9044-dd237b0e91c6 not found
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removeContainerFromPipeline(PipelineStateMap.java:372)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removeContainerFromPipeline(PipelineStateManager.java:111)
        at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removeContainerFromPipeline(SCMPipelineManager.java:413)
        at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:352)
        at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:331)
        at 
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:66)
        at 
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.Onmessage(CloseContainerEventHandler.java:45)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor
{code}

The simple solution is to catch the exception and ignore it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to