[ https://issues.apache.org/jira/browse/HDFS-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weiwei Yang updated HDFS-11729: ------------------------------- Attachment: HDFS-11729.001.patch > Improve NNStorageRetentionManager failure handling. > --------------------------------------------------- > > Key: HDFS-11729 > URL: https://issues.apache.org/jira/browse/HDFS-11729 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Attachments: HDFS-11729.001.patch > > > Currently {{NNStorageRetentionManager}} will simply skip a storage directory > if a problem is detected. Since checkpoint saving does not go through the > same set of checks, this can lead to a space exhaustion seen in HDFS-11714. > Instead of ignoring errors, it should handle it properly. One potential > improvement is to catch the exception and report the storage directory > failure using {{NNStorage.reportErrorsOnDirectories()}}. > {{attemptRestoreRemovedStorage()}} will need extra checks. E.g. existence of > a VERSION file. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org