[ https://issues.apache.org/jira/browse/HADOOP-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868334#action_12868334 ]
Doug Cutting commented on HADOOP-6761: -------------------------------------- Overall this looks good. Some comments: - the unit test takes a full minute. might we change it to only take 10 seconds or so? that would require changing both parameters to floats, which i don't think is unreasonable. - when the emptier interval is misconfigured, shouldn't we print a warning? - the new config parameter might better be called 'checkpoint' rather than 'check', since checking and checkpointing mean very different things for a filesystem. > Improve Trash Emptier > --------------------- > > Key: HADOOP-6761 > URL: https://issues.apache.org/jira/browse/HADOOP-6761 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Dmytro Molkov > Assignee: Dmytro Molkov > Attachments: HADOOP-6761.2.patch, HADOOP-6761.3.patch, > HADOOP-6761.patch > > > There are two inefficiencies in the Trash functionality right now that have > caused some problems for us. > First if you configured your trash interval to be one day (24 hours) that > means that you store 2 days worth of data eventually. The Current and the > previous timestamp that will not be deleted until the end of the interval. > And another problem is accumulating a lot of data in Trash before the Emptier > wakes up. If there are a couple of million files trashed and the Emptier does > deletion on HDFS the NameNode will freeze until everything is removed. (this > particular problem hopefully will be addressed with HDFS-1143). > My proposal is to have two configuration intervals. One for deleting the > trashed data and another for checkpointing. This way for example for > intervals of one day and one hour we will only store 25 hours of data instead > of 48 right now and the deletions will be happening in smaller chunks every > hour of the day instead of a huge deletion at the end of the day now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.