[ https://issues.apache.org/jira/browse/HDFS-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521002#comment-14521002 ]
Zhe Zhang commented on HDFS-8178: --------------------------------- Thanks ATM for the helpful review! After looking at HDFS-5919 more closely, we are actually trying to solve a different problem here. The objective of HDFS-5919 is sorely to save disk space (since FJM doesn't try to process those corrupt/empty files anyway). It's a safe cleanup, making sure the tx ID of empty / corrupt files are old enough before purging. So I think we should do the same in QJM. Our main target here is _stale_ in-progress edit log files, which are not necessarily empty/corrupt (so they won't be mark as so). As the updated description states, we want to properly take care of those files so QJM doesn't try to process them. I like your proposal of rename / move aside those files and remove them when they are older than {{minTxIdToKeep}}. I'll update the patch based on this idea. I also propose we do the same for corrupt / empty files, for both FJM and QJM. > QJM doesn't move aside stale inprogress edits files > --------------------------------------------------- > > Key: HDFS-8178 > URL: https://issues.apache.org/jira/browse/HDFS-8178 > Project: Hadoop HDFS > Issue Type: Bug > Components: qjm > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: HDFS-8178.000.patch > > > When a QJM crashes, the in-progress edit log file at that time remains in the > file system. When the node comes back, it will accept new edit logs and those > stale in-progress files are never cleaned up. QJM treats them as regular > in-progress edit log files and tries to finalize them, which potentially > causes high memory usage. This JIRA aims to move aside those stale edit log > files to avoid this scenario. -- This message was sent by Atlassian JIRA (v6.3.4#6332)