[ https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ramkumar Vadali updated MAPREDUCE-2167: --------------------------------------- Attachment: MAPREDUCE-2167.4.patch Fixed a broken test. TEST RESULTS: ant test-patch has the same number of failures as a clean checkout {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 13 new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] {code} ant test succeeds: {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 47.071 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 124.583 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.337 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.481 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.392 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.485 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 71.136 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 471.072 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.828 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec test: BUILD SUCCESSFUL Total time: 15 minutes 6 seconds {code} > Faster directory traversal for raid node > ---------------------------------------- > > Key: MAPREDUCE-2167 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid > Reporter: Ramkumar Vadali > Assignee: Ramkumar Vadali > Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.3.patch, > MAPREDUCE-2167.4.patch, MAPREDUCE-2167.patch > > > The RaidNode currently iterates over the directory structure to figure out > which files to RAID. With millions of files, this can take a long time - > especially if some files are already RAIDed and the RaidNode needs to look at > parity files / parity file HARs to determine if the file needs to be RAIDed. > The directory traversal is encapsulated inside the class DirectoryTraversal, > which examines one file at a time, using the caller's thread. > My proposal is to make this multi-threaded as follows: > * use a pool of threads inside DirectoryTraversal > * The caller's thread is used to retrieve directories, and each new > directory is assigned to a thread in the pool. The worker thread examines all > the files the directory. > * If there sub-directories, those are added back as workitems to the pool. > Comments? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.