[ https://issues.apache.org/jira/browse/HDFS-729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806441#action_12806441 ]
dhruba borthakur commented on HDFS-729: --------------------------------------- > o basically the burden on removing duplicates is passed on to the c This is correct. fsck will actually invoke this API only once and will print out the list of first 500 corrupted files. It will not invoke this API multiple times. This list actually helps the adminstrator because he/she can get a partial list of corrupted files very quickly. My theory is that this partial list is better than waiting for a total fsck to finish which can take hours. > fsck option to list only corrupted files > ---------------------------------------- > > Key: HDFS-729 > URL: https://issues.apache.org/jira/browse/HDFS-729 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: badFiles.txt, badFiles2.txt, corruptFiles.txt > > > An option to fsck to list only corrupted files will be very helpful for > frequent monitoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.