[ 
https://issues.apache.org/jira/browse/HDDS-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-9658:
------------------------------
    Target Version/s: 2.0.0, 1.4.2

> EC: Recovering container cleanup at DN start is not happening due to NPE.
> -------------------------------------------------------------------------
>
>                 Key: HDDS-9658
>                 URL: https://issues.apache.org/jira/browse/HDDS-9658
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Uma Maheswara Rao G
>            Assignee: Sumit Agrawal
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>
> [~SaketaChalamchala], [~swamirishi] and me were investigating an issue.
> In our investigation, we found that DN startup time recovering cleanup is 
> failing due to NPE.
> {code:java}
> 2023-11-07 16:23:49,659 
> [3700a088-6cef-4700-993f-8f78b8f6d103-ContainerReader-0] ERROR 
> ozoneimpl.ContainerReader (ContainerReader.java:readVolume(168)) - Failed to 
> load container from 
> /Users/umagangumalla/Work/repos/Apache/hadoop-ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-1791ad9a-e8d5-4639-b155-e3e62d68600c/datanode-7/data-0/containers/hdds/1791ad9a-e8d5-4639-b155-e3e62d68600c/current/containerDir0/2
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.getTmpDirectoryPath(KeyValueContainerUtil.java:513)
> at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.moveToDeletedContainerDir(KeyValueContainerUtil.java:491)
> at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.removeContainer(KeyValueContainerUtil.java:142)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.cleanupContainer(ContainerReader.java:251)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:217)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:190){code}
> On DN startup, DN calls the following code to cleanup recovering containers:
> in ContainerReader.java#verifyAndFixupContainerData
> ....
> if (kvContainer.getContainerState() == RECOVERING) {
> if (shouldDeleteRecovering) {
> cleanupContainer(hddsVolume, kvContainer);
> kvContainer.delete();
> LOG.info("Delete recovering container {}.",
> kvContainer.getContainerData().getContainerID());
> }
> return;
> }
> The reason seems like createTmpDirs is not called by the time of this 
> cleanup. So 
> deletedContainerDir is null and that's causing NPE. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to