[jira] [Created] (MAPREDUCE-2570) Bug in RAID FS (DistributedRaidFileSystem) unraid path
Bug in RAID FS (DistributedRaidFileSystem) unraid path -- Key: MAPREDUCE-2570 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2570 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali The un-raid path in DistributedRaidFileSystem goes through RaidNode.unRaidCorruptBlock(), which has a bug when the parity file is inside a HAR. The temporary file that contains the recovered block contents is created in the filesystem that hosts the parity file. In case the parity file is inside a HAR, its filesystem is HarFileSystem, which is read-only. In this case the temporary file creation will fail. The fix is a one-line change to use the underlying filesystem of the HAR. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()
[ https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038086#comment-13038086 ] Ramkumar Vadali commented on MAPREDUCE-2186: The main motivation to open this jira was to allow CombineFileInputFormat to work when there are missing blocks. CombineFileInputFormat figures out the host/rack information for input blocks and uses that information to create input splits. It does not handle the case where a block does not have any host/rack information. The proposed fix to return the location of parity blocks in the case where source blocks are missing is not good because it is fixing the problem in the wrong place. It also causes us to get false locality. Instead of changing RAID FS to handle this case, its better to fix CFIF to handle the case when there are missing blocks (MAPREDUCE-2185) DistributedRaidFileSystem should implement getFileBlockLocations() -- Key: MAPREDUCE-2186 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali If a RAIDed file has missing blocks, DistributedRaidFileSystem.getFileBlockLocations() would return no block locations. This could lead a client to believe that the file is not readable. But if parity data is available, the file actually is readable. It would be better to implement getFileBlockLocations() and return the location of the parity blocks that would be needed to reconstruct the missing block. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()
[ https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali resolved MAPREDUCE-2186. Resolution: Won't Fix Better to fix MAPREDUCE-2185 DistributedRaidFileSystem should implement getFileBlockLocations() -- Key: MAPREDUCE-2186 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali If a RAIDed file has missing blocks, DistributedRaidFileSystem.getFileBlockLocations() would return no block locations. This could lead a client to believe that the file is not readable. But if parity data is available, the file actually is readable. It would be better to implement getFileBlockLocations() and return the location of the parity blocks that would be needed to reconstruct the missing block. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2498) TestRaidShellFsck failing on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali reassigned MAPREDUCE-2498: -- Assignee: Ramkumar Vadali (was: Todd Lipcon) TestRaidShellFsck failing on trunk -- Key: MAPREDUCE-2498 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2498 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ramkumar Vadali Priority: Critical Fix For: 0.23.0 Attachments: mapreduce-2498.txt TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2 has been failing the last several builds: Error Message: parity file not HARed after 40s java.io.IOException: parity file not HARed after 40s at org.apache.hadoop.raid.TestRaidShellFsck.raidTestFiles(TestRaidShellFsck.java:281) at org.apache.hadoop.raid.TestRaidShellFsck.setUp(TestRaidShellFsck.java:181) at org.apache.hadoop.raid.TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2(TestRaidShellFsck.java:666) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2185) Infinite loop at creating splits using CombineFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2185: --- Attachment: MAPREDUCE-2185.patch For blocks that do not have hosts associated with them, use NetworkTopology.DEFAULT_RACK as the rack location. This avoids the infinite loop later on in getMoreSplits() Infinite loop at creating splits using CombineFileInputFormat - Key: MAPREDUCE-2185 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2185 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: MAPREDUCE-2185.patch This is caused by a missing block in HDFS. So the block's locations are empty. The following code adds the block to blockToNodes map but not to rackToBlocks map. Later on when generating splits, only blocks in rackToBlocks are removed from blockToNodes map. So blockToNodes map can never become empty therefore causing infinite loop {code} // add this block to the block -- node locations map blockToNodes.put(oneblock, oneblock.hosts); // add this block to the rack -- block map for (int j = 0; j oneblock.racks.length; j++) { .. } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2185) Infinite loop at creating splits using CombineFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2185: --- Assignee: Ramkumar Vadali (was: Hairong Kuang) Status: Patch Available (was: Open) Infinite loop at creating splits using CombineFileInputFormat - Key: MAPREDUCE-2185 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2185 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission Reporter: Hairong Kuang Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2185.patch This is caused by a missing block in HDFS. So the block's locations are empty. The following code adds the block to blockToNodes map but not to rackToBlocks map. Later on when generating splits, only blocks in rackToBlocks are removed from blockToNodes map. So blockToNodes map can never become empty therefore causing infinite loop {code} // add this block to the block -- node locations map blockToNodes.put(oneblock, oneblock.hosts); // add this block to the rack -- block map for (int j = 0; j oneblock.racks.length; j++) { .. } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2482) Enable RAID contrib in trunk
Enable RAID contrib in trunk Key: MAPREDUCE-2482 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2482 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.20.3 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali The RAID contrib project can be re-enabled since federation related changes are now in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2482) Enable RAID contrib in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali resolved MAPREDUCE-2482. Resolution: Duplicate Duplicate of MAPREDUCE-2467. For some reason I thought MAPREDUCE-2467 was committed. Enable RAID contrib in trunk Key: MAPREDUCE-2482 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2482 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.20.3 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali The RAID contrib project can be re-enabled since federation related changes are now in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2467) HDFS-1052 changes break the raid contrib module in MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031357#comment-13031357 ] Ramkumar Vadali commented on MAPREDUCE-2467: Hi Suresh Sorry for the delay in responding. I think the test failures are unrelated 1. testConcurrentJobs is failing because one file is not detected as corrupt and so block fixer does not fix it. This seems to be intermittent, since I tried running the test and it succeeded twice. I dont think this is related to federation. Perhaps we can track it separately? 2. testFileBlockAndParityBlockMissingHar2 failed because of insufficient heap space when running a HAR job through LocalJobRunner. Again, unrelated to federation 3. testJobQueues - Failed because of a timeout. Also, RAID changes cannot affect core mapred tests, so this must be unrelated. HDFS-1052 changes break the raid contrib module in MapReduce Key: MAPREDUCE-2467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2467 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: MR-2467.1.patch, MR-2467.2.patch, MR-2467.3.patch, MR-2467.patch Raid contrib module requires changes to work with the federation changes made in HDFS-1052. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2467) HDFS-1052 changes break the raid contrib module in MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028502#comment-13028502 ] Ramkumar Vadali commented on MAPREDUCE-2467: +1 looks good HDFS-1052 changes break the raid contrib module in MapReduce Key: MAPREDUCE-2467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2467 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: MR-2467.1.patch, MR-2467.patch Raid contrib module requires changes to work with the federation changes made in HDFS-1052. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2465) HDFS raid not compiling after federation merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028504#comment-13028504 ] Ramkumar Vadali commented on MAPREDUCE-2465: Suresh, the patch for MAPREDUCE-2467 looks good. HDFS raid not compiling after federation merge -- Key: MAPREDUCE-2465 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2465 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ramkumar Vadali Priority: Blocker Attachments: disable-raid-compilation.txt, failure.txt, fix-compile-but-raid-broken.txt The RAID contrib is no longer compiling now that federation has been merged, due to some API changes in LocatedBlock and FSDataset. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2467) HDFS-1052 changes break the raid contrib module in MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028503#comment-13028503 ] Ramkumar Vadali commented on MAPREDUCE-2467: Thanks for making the changes! HDFS-1052 changes break the raid contrib module in MapReduce Key: MAPREDUCE-2467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2467 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: MR-2467.1.patch, MR-2467.patch Raid contrib module requires changes to work with the federation changes made in HDFS-1052. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2465) HDFS raid not compiling after federation merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027941#comment-13027941 ] Ramkumar Vadali commented on MAPREDUCE-2465: I will work on making this compile HDFS raid not compiling after federation merge -- Key: MAPREDUCE-2465 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2465 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ramkumar Vadali Priority: Blocker Attachments: disable-raid-compilation.txt, failure.txt, fix-compile-but-raid-broken.txt The RAID contrib is no longer compiling now that federation has been merged, due to some API changes in LocatedBlock and FSDataset. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2436) RAID block fixer should prioritize block fix operations
RAID block fixer should prioritize block fix operations --- Key: MAPREDUCE-2436 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2436 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.20.2, 0.20.3 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali The RAID block fixer submits mapreduce jobs to fix corrupt files. This is OK for XOR RAID, but with Reed-Solomon RAID, there can be large number of corrupt files when even a single datanode goes dead. With Reed-SOlomon RAID, it is better to categorize corrupt files based on urgency. Files with only one corrupt block can be treated as lower priority than those with more number of corrupt blocks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2395) TestBlockFixer timing out on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009779#comment-13009779 ] Ramkumar Vadali commented on MAPREDUCE-2395: Yes, I saw that but could not reproduce it. Also, it is weird since this patch has only test code changes. TestBlockFixer timing out on trunk -- Key: MAPREDUCE-2395 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ramkumar Vadali Priority: Critical Fix For: 0.23.0 Attachments: MAPREDUCE-2395.patch In recent Hudson builds, TestBlockFixer has been timing out. Not clear how long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from Hudson's test result parsing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2368) RAID DFS regression
[ https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2368: --- Status: Open (was: Patch Available) Will resubmit patch now that MiniMRCluster delays are resolved. RAID DFS regression --- Key: MAPREDUCE-2368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.20.3 Attachments: MAPREDUCE-2368.patch The patch for MAPREDUCE-2248 did not handle zero-length files correctly, which leads to ArrayIndexOutOfBoundsException when opening a zero-length file. That case needs special handling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2368) RAID DFS regression
[ https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2368: --- Status: Patch Available (was: Open) RAID DFS regression --- Key: MAPREDUCE-2368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.20.3 Attachments: MAPREDUCE-2368.patch The patch for MAPREDUCE-2248 did not handle zero-length files correctly, which leads to ArrayIndexOutOfBoundsException when opening a zero-length file. That case needs special handling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (MAPREDUCE-2395) TestBlockFixer timing out on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali reassigned MAPREDUCE-2395: -- Assignee: Ramkumar Vadali TestBlockFixer timing out on trunk -- Key: MAPREDUCE-2395 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ramkumar Vadali Priority: Critical Fix For: 0.23.0 In recent Hudson builds, TestBlockFixer has been timing out. Not clear how long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from Hudson's test result parsing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2395) TestBlockFixer timing out on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2395: --- Attachment: MAPREDUCE-2395.patch Breaks TestBlockFixer into several tests. The file TestBlockFixer.java now has tests that do not use a MiniMRCluster. The other TestBlockFixer*.java files have a few tests each that use MiniMRCluster. TestBlockFixer timing out on trunk -- Key: MAPREDUCE-2395 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ramkumar Vadali Priority: Critical Fix For: 0.23.0 Attachments: MAPREDUCE-2395.patch In recent Hudson builds, TestBlockFixer has been timing out. Not clear how long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from Hudson's test result parsing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2395) TestBlockFixer timing out on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2395: --- Status: Patch Available (was: Open) TestBlockFixer timing out on trunk -- Key: MAPREDUCE-2395 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ramkumar Vadali Priority: Critical Fix For: 0.23.0 Attachments: MAPREDUCE-2395.patch In recent Hudson builds, TestBlockFixer has been timing out. Not clear how long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from Hudson's test result parsing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2368) RAID DFS regression
RAID DFS regression --- Key: MAPREDUCE-2368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.20.3 The patch for MAPREDUCE-2248 did not handle zero-length files correctly, which leads to ArrayIndexOutOfBoundsException when opening a zero-length file. That case needs special handling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2368) RAID DFS regression
[ https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2368: --- Attachment: MAPREDUCE-2368.patch This patch handles the len == 0 case. RAID DFS regression --- Key: MAPREDUCE-2368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.20.3 Attachments: MAPREDUCE-2368.patch The patch for MAPREDUCE-2248 did not handle zero-length files correctly, which leads to ArrayIndexOutOfBoundsException when opening a zero-length file. That case needs special handling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2368) RAID DFS regression
[ https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004133#comment-13004133 ] Ramkumar Vadali commented on MAPREDUCE-2368: The failed contrib tests were in mumak and raid. I will take a look at the raid test failure. [exec] [junit] Test org.apache.hadoop.mapred.TestSimulatorEndToEnd FAILED (timeout) [exec] [junit] Test org.apache.hadoop.mapred.TestSimulatorSerialJobSubmission FAILED (timeout) [exec] [junit] Test org.apache.hadoop.mapred.TestSimulatorStressJobSubmission FAILED (timeout) [exec] [junit] Test org.apache.hadoop.raid.TestBlockFixer FAILED (timeout) RAID DFS regression --- Key: MAPREDUCE-2368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.20.3 Attachments: MAPREDUCE-2368.patch The patch for MAPREDUCE-2248 did not handle zero-length files correctly, which leads to ArrayIndexOutOfBoundsException when opening a zero-length file. That case needs special handling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2239) BlockPlacementPolicyRaid should call getBlockLocations only when necessary
[ https://issues.apache.org/jira/browse/MAPREDUCE-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002493#comment-13002493 ] Ramkumar Vadali commented on MAPREDUCE-2239: +1 Patch looks good BlockPlacementPolicyRaid should call getBlockLocations only when necessary -- Key: MAPREDUCE-2239 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2239 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.23.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.23.0 Attachments: MAPREDUCE-2239-1.txt, MAPREDUCE-2239-2.txt, MAPREDUCE-2239-3.txt, MAPREDUCE-2239.txt Currently BlockPlacementPolicyRaid calls getBlockLocations for every chooseTarget(). This puts pressure on NameNode. We should avoid calling if this file is not raided or a parity file. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2347) RAID blockfixer should check file blocks after the file is fixed
RAID blockfixer should check file blocks after the file is fixed Key: MAPREDUCE-2347 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2347 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.20.2 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali After a file is fixed by the block fixer, all its blocks should be checked for the presence of replicas. If any block still is missing valid replicas, it should be fixed again -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2333) RAID jobs should delete temporary files in the event of filesystem failures
RAID jobs should delete temporary files in the event of filesystem failures --- Key: MAPREDUCE-2333 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2333 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor If the creation of a parity file or parity file HAR fails due to a filesystem level error, RAID should delete the temporary files. Specifically, datanode death during parity file creation would cause FSDataOutputStream.close() to throw an IOException. The RAID code should delete such a file. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2329) RAID BlockFixer should exclude temporary files
RAID BlockFixer should exclude temporary files -- Key: MAPREDUCE-2329 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2329 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.3 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor RAID BlockFixer should exclude files matching the pattern ^/tmp/.* -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2329) RAID BlockFixer should exclude temporary files
[ https://issues.apache.org/jira/browse/MAPREDUCE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2329: --- Component/s: contrib/raid RAID BlockFixer should exclude temporary files -- Key: MAPREDUCE-2329 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2329 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.20.2, 0.20.3 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor RAID BlockFixer should exclude files matching the pattern ^/tmp/.* -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2320) RAID DistBlockFixer should limit pending jobs instead of pending files
[ https://issues.apache.org/jira/browse/MAPREDUCE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2320: --- Component/s: contrib/raid Affects Version/s: 0.20.3 0.20.2 Issue Type: Improvement (was: Bug) RAID DistBlockFixer should limit pending jobs instead of pending files -- Key: MAPREDUCE-2320 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2320 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.20.2, 0.20.3 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor DistBlockFixer limits the number of files being fixed simultaneously to avoid an unlimited backlog. This limits the number of parallel jobs though, and if one job has a long running task, it prevents newer jobs being started. Instead, it should have a limit on running jobs. That way, one long running task will not block other jobs. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2320) RAID DistBlockFixer should limit pending jobs instead of pending files
RAID DistBlockFixer should limit pending jobs instead of pending files -- Key: MAPREDUCE-2320 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2320 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor DistBlockFixer limits the number of files being fixed simultaneously to avoid an unlimited backlog. This limits the number of parallel jobs though, and if one job has a long running task, it prevents newer jobs being started. Instead, it should have a limit on running jobs. That way, one long running task will not block other jobs. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2313) RAID code does not close some opened streams
RAID code does not close some opened streams Key: MAPREDUCE-2313 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2313 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali There are some instances where opened streams are not closed, leading to a file descriptor leak. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2312) Better error handling in RaidShell
Better error handling in RaidShell -- Key: MAPREDUCE-2312 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2312 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor If there is an error trying to find the parity information for a corrupt file, RaidShell should print it as corrupt, instead of bailing. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2303) RAID BlockFixer should choose targets better
RAID BlockFixer should choose targets better Key: MAPREDUCE-2303 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2303 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali The RAID BlockFixer chooses the destination of the generated block at random. It avoids nodes that have a corrupt replica of the block, but does not do anything beyond that. It needs to avoid data nodes that have a replica of any source or parity block in the block's stripe. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe
[ https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2267: --- Status: Open (was: Patch Available) Will upload another patch Parallelize reading of blocks within a stripe - Key: MAPREDUCE-2267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, MAPREDUCE-2267.3.patch, MAPREDUCE-2267.4.patch, MAPREDUCE-2267.patch RAID code has several instances where several blocks of data have to be read to perform an operation. For example, computing a parity block requires reading the blocks of the source file. Similarly, generating a fixed block requires reading a parity block and the good blocks from the source file. These read operations proceed sequentially currently. RAID code should use a thread pool to increase the parallelism and thus reduce latency. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2285) MiniMRCluster does not start after ant test-patch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988220#action_12988220 ] Ramkumar Vadali commented on MAPREDUCE-2285: The patch fixes the problem. I am no ivy expert, but it looks good to me. MiniMRCluster does not start after ant test-patch - Key: MAPREDUCE-2285 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2285 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Ramkumar Vadali Priority: Blocker Attachments: cp-bad, cp-good, fix-build.diff Any test using MiniMRCluster hangs in the MiniMRCluster constructor after running ant test-patch. Steps to reproduce: 1. ant -Dpatch.file=dummy patch to CHANGES.txt -Dforrest.home=path to forrest -Dfindbugs.home=path to findbugs -Dscratch.dir=/tmp/testpatch -Djava5.home=path to java5 test-patch 2. Run any test that creates MiniMRCluster, say ant test -Dtestcase=TestFileArgs (contrib/streaming) Expected result: Test should succeed Actual result: Test hangs in MiniMRCluster.init. This does not happen if we run ant clean after ant test-patch Test output: {code} [junit] 11/01/27 12:11:43 INFO ipc.Server: IPC Server handler 3 on 58675: starting [junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:58675 [junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:58675 [junit] 11/01/27 12:11:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 0 time(s). [junit] 11/01/27 12:11:45 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 1 time(s). [junit] 11/01/27 12:11:46 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 2 time(s). [junit] 11/01/27 12:11:47 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 3 time(s). [junit] 11/01/27 12:11:48 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 4 time(s). [junit] 11/01/27 12:11:49 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 5 time(s). [junit] 11/01/27 12:11:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 6 time(s). [junit] 11/01/27 12:11:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 7 time(s). [junit] 11/01/27 12:11:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 8 time(s). [junit] 11/01/27 12:11:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 9 time(s). [junit] 11/01/27 12:11:53 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not available yet, Z... {code} Stack trace: {code} at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:611) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:429) - locked 0x7f3b8dc08700 (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:504) - locked 0x7f3b8dc08700 (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:206) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164) at org.apache.hadoop.ipc.Client.call(Client.java:1008) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) at org.apache.hadoop.mapred.$Proxy11.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:235) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:275) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:206) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:185) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:169) at org.apache.hadoop.mapred.TaskTracker$2.run(TaskTracker.java:699) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1142) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:695) - locked 0x7f3b8ccc3870 (a org.apache.hadoop.mapred.TaskTracker) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1391) at org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.createTaskTracker(MiniMRCluster.java:219)
[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987669#action_12987669 ] Ramkumar Vadali commented on MAPREDUCE-2283: Update: If I run ant clean from the top-level and run `ant test -Dtestcase=TestBlockFixer`, it runs fine. But if I run ant test-patch from the top level and run it again, it gets stuck. I ran with test.output=yes to see what was going on, and found this: {code} [junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:50197 [junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:50197 [junit] 11/01/27 09:21:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 0 time(s). [junit] 11/01/27 09:21:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 1 time(s). [junit] 11/01/27 09:21:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 2 time(s). [junit] 11/01/27 09:21:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 3 time(s). [junit] 11/01/27 09:21:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 4 time(s). [junit] 11/01/27 09:21:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 5 time(s). [junit] 11/01/27 09:21:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 6 time(s). [junit] 11/01/27 09:21:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 7 time(s). [junit] 11/01/27 09:21:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 8 time(s). [junit] 11/01/27 09:21:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 9 time(s). [junit] 11/01/27 09:21:34 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not available yet, Z... {code} I think hudson does something like this, and ant test-patch is somehow pulling in a jar that prevents MiniMRCluster from starting. To check, I wrote a simple test that only tries to start a MiniMRCluster: {code} public class TestStuckMiniMR extends TestCase { public static final int NUM_DATANODES = 3; Configuration conf; String namenode = null; MiniDFSCluster dfs = null; MiniMRCluster mr = null; String jobTrackerName = null; FileSystem fileSys = null; protected void setUp() throws Exception { conf = new Configuration(); dfs = new MiniDFSCluster(conf, NUM_DATANODES, true, null); dfs.waitActive(); fileSys = dfs.getFileSystem(); namenode = fileSys.getUri().toString(); FileSystem.setDefaultUri(conf, namenode); mr = new MiniMRCluster(4, namenode, 3); jobTrackerName = localhost: + mr.getJobTrackerPort(); } protected void tearDown() { dfs.shutdown(); mr.shutdown(); } public void testStuck() throws Exception { System.out.println(Done); } } {code} This also gets stuck in setup. So I think the problem is outside RAID. Infact, just after I tried this, I tried running a test under contrib/streaming. That also gets stuck the same way. {code} ant test -Dtestcase=TestFileArgs -Dtest.output=yes {code} The output: {code} [junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:59339 [junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:59339 [junit] 11/01/27 09:42:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 0 time(s). [junit] 11/01/27 09:42:12 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 1 time(s). [junit] 11/01/27 09:42:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 2 time(s). {code} Can someone try killing TestBlockFixer and run TestFileArgs on the machine thats running hudson? TestBlockFixer hangs initializing MiniMRCluster --- Key: MAPREDUCE-2283 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0 TestBlockFixer (a raid contrib test) is hanging the precommit testing on Hudson -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2283: --- Attachment: MAPREDUCE-2283.patch This enables a timeout for RAID tests. This does not fix the MiniMRCluster problem though. {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.raid.TestBlockFixer FAILED (timeout) BUILD FAILED /data/users/rvadali/apache/hadoop-mapred-trunk/build.xml:821: The following error occurred while executing this line: /data/users/rvadali/apache/hadoop-mapred-trunk/build.xml:805: The following error occurred while executing this line: /data/users/rvadali/apache/hadoop-mapred-trunk/src/contrib/build.xml:60: The following error occurred while executing this line: /data/users/rvadali/apache/hadoop-mapred-trunk/src/contrib/raid/build.xml:60: Tests failed! {code} TestBlockFixer hangs initializing MiniMRCluster --- Key: MAPREDUCE-2283 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0 Attachments: MAPREDUCE-2283.patch TestBlockFixer (a raid contrib test) is hanging the precommit testing on Hudson -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2285) MiniMRCluster does not start after ant test-patch
MiniMRCluster does not start after ant test-patch - Key: MAPREDUCE-2285 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2285 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Ramkumar Vadali Any test using MiniMRCluster hangs in the MiniMRCluster constructor after running ant test-patch. Steps to reproduce: 1. ant -Dpatch.file=dummy patch to CHANGES.txt -Dforrest.home=path to forrest -Dfindbugs.home=path to findbugs -Dscratch.dir=/tmp/testpatch -Djava5.home=path to java5 test-patch 2. Run any test that creates MiniMRCluster, say ant test -Dtestcase=TestFileArgs (contrib/streaming) Expected result: Test should succeed Actual result: Test hangs in MiniMRCluster.init. This does not happen if we run ant clean after ant test-patch Test output: {code} [junit] 11/01/27 12:11:43 INFO ipc.Server: IPC Server handler 3 on 58675: starting [junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:58675 [junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:58675 [junit] 11/01/27 12:11:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 0 time(s). [junit] 11/01/27 12:11:45 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 1 time(s). [junit] 11/01/27 12:11:46 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 2 time(s). [junit] 11/01/27 12:11:47 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 3 time(s). [junit] 11/01/27 12:11:48 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 4 time(s). [junit] 11/01/27 12:11:49 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 5 time(s). [junit] 11/01/27 12:11:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 6 time(s). [junit] 11/01/27 12:11:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 7 time(s). [junit] 11/01/27 12:11:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 8 time(s). [junit] 11/01/27 12:11:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 9 time(s). [junit] 11/01/27 12:11:53 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not available yet, Z... {code} Stack trace: {code} at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:611) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:429) - locked 0x7f3b8dc08700 (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:504) - locked 0x7f3b8dc08700 (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:206) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164) at org.apache.hadoop.ipc.Client.call(Client.java:1008) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) at org.apache.hadoop.mapred.$Proxy11.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:235) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:275) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:206) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:185) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:169) at org.apache.hadoop.mapred.TaskTracker$2.run(TaskTracker.java:699) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1142) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:695) - locked 0x7f3b8ccc3870 (a org.apache.hadoop.mapred.TaskTracker) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1391) at org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.createTaskTracker(MiniMRCluster.java:219) at org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner$1.run(MiniMRCluster.java:203) at org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner$1.run(MiniMRCluster.java:201) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1142) at
[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987733#action_12987733 ] Ramkumar Vadali commented on MAPREDUCE-2283: Jira for MiniMRCluster problem: MAPREDUCE-2285 TestBlockFixer hangs initializing MiniMRCluster --- Key: MAPREDUCE-2283 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0 Attachments: MAPREDUCE-2283.patch TestBlockFixer (a raid contrib test) is hanging the precommit testing on Hudson -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987303#action_12987303 ] Ramkumar Vadali commented on MAPREDUCE-2283: I think this has something to do with the MR ports change. Is it possible that hudson does not do ant clean? This test has not changed recently, but MiniMRCluster has TestBlockFixer hangs initializing MiniMRCluster --- Key: MAPREDUCE-2283 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0 TestBlockFixer (a raid contrib test) is hanging the precommit testing on Hudson -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987399#action_12987399 ] Ramkumar Vadali commented on MAPREDUCE-2283: Please turn it off while I figure out whats happening TestBlockFixer hangs initializing MiniMRCluster --- Key: MAPREDUCE-2283 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0 TestBlockFixer (a raid contrib test) is hanging the precommit testing on Hudson -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Attachment: MAPREDUCE-2250.2.patch Pulling in a test fix from another jira. Fix logging in raid code. - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.2.patch, MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986049#action_12986049 ] Ramkumar Vadali commented on MAPREDUCE-2250: TEST RESULTS: {code} [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] {code} {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 498.696 sec [junit] Running org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 153.311 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 969.737 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 12.785 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 17.575 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.295 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.038 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 14.459 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 65.327 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 512.67 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 254.251 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 41.865 sec [junit] Running org.apache.hadoop.raid.TestRaidShellFsck [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 257.72 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.654 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.779 sec test: BUILD SUCCESSFUL {code} Fix logging in raid code. - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.2.patch, MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2279) Improper byte - int conversion in DistributedRaidFileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2279: --- Attachment: MAPREDUCE-2279.1.patch Minor optimization in read() Improper byte - int conversion in DistributedRaidFileSystem Key: MAPREDUCE-2279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2279 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2279.1.patch, MAPREDUCE-2279.patch When return a byte value from DistributedRaidFileSystem.read(), we should do 0xff byteVal. Otherwise the returned int value will be incorrectly negative. This is a regression from MAPREDUCE-2248 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe
[ https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2267: --- Attachment: MAPREDUCE-2267.3.patch Fixed RaidShell to not invoke the recoverFile RPC but use DistributedRaidFileSytsem to read a corrupt file Parallelize reading of blocks within a stripe - Key: MAPREDUCE-2267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, MAPREDUCE-2267.3.patch, MAPREDUCE-2267.patch RAID code has several instances where several blocks of data have to be read to perform an operation. For example, computing a parity block requires reading the blocks of the source file. Similarly, generating a fixed block requires reading a parity block and the good blocks from the source file. These read operations proceed sequentially currently. RAID code should use a thread pool to increase the parallelism and thus reduce latency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe
[ https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2267: --- Attachment: MAPREDUCE-2267.4.patch Attached diff from top level Parallelize reading of blocks within a stripe - Key: MAPREDUCE-2267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, MAPREDUCE-2267.3.patch, MAPREDUCE-2267.4.patch, MAPREDUCE-2267.patch RAID code has several instances where several blocks of data have to be read to perform an operation. For example, computing a parity block requires reading the blocks of the source file. Similarly, generating a fixed block requires reading a parity block and the good blocks from the source file. These read operations proceed sequentially currently. RAID code should use a thread pool to increase the parallelism and thus reduce latency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe
[ https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984354#action_12984354 ] Ramkumar Vadali commented on MAPREDUCE-2267: TEST RESULTS {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 341.649 sec [junit] Running org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 239.963 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 880.943 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 12.681 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 18.833 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.293 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.037 sec [junit] Running org.apache.hadoop.raid.TestParallelReader [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.141 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.981 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 70.121 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 547.15 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 137.672 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 22.473 sec [junit] Running org.apache.hadoop.raid.TestRaidShellFsck [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 266.466 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.016 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.729 sec test: BUILD SUCCESSFUL Total time: 43 minutes 5 seconds {code} {code} [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] {code} Parallelize reading of blocks within a stripe - Key: MAPREDUCE-2267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, MAPREDUCE-2267.3.patch, MAPREDUCE-2267.4.patch, MAPREDUCE-2267.patch RAID code has several instances where several blocks of data have to be read to perform an operation. For example, computing a parity block requires reading the blocks of the source file. Similarly, generating a fixed block requires reading a parity block and the good blocks from the source file.
[jira] Updated: (MAPREDUCE-2279) Improper byte - int conversion in DistributedRaidFileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2279: --- Attachment: MAPREDUCE-2279.patch Improper byte - int conversion in DistributedRaidFileSystem Key: MAPREDUCE-2279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2279 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2279.patch When return a byte value from DistributedRaidFileSystem.read(), we should do 0xff byteVal. Otherwise the returned int value will be incorrectly negative. This is a regression from MAPREDUCE-2248 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2274) Generalize block fixer scheduler options
[ https://issues.apache.org/jira/browse/MAPREDUCE-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2274: --- Description: The Raid block fixer currently allows the specification of the fair scheduler pool name. This is not generic since it assumes usage of the fair scheduler. Also this does not allow multiple options to be set, just the pool name. This is similar to MAPREDUCE-1818 (was: The Raid block fixer currently allows the specification of the fair scheduler pool name. This is not generic since it assumes usage of the fair scheduler. Also this does not allow multiple options to be set, just the pool name.) Generalize block fixer scheduler options Key: MAPREDUCE-2274 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2274 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali The Raid block fixer currently allows the specification of the fair scheduler pool name. This is not generic since it assumes usage of the fair scheduler. Also this does not allow multiple options to be set, just the pool name. This is similar to MAPREDUCE-1818 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2274) Generalize block fixer scheduler options
[ https://issues.apache.org/jira/browse/MAPREDUCE-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2274: --- Priority: Minor (was: Major) Generalize block fixer scheduler options Key: MAPREDUCE-2274 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2274 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor The Raid block fixer currently allows the specification of the fair scheduler pool name. This is not generic since it assumes usage of the fair scheduler. Also this does not allow multiple options to be set, just the pool name. This is similar to MAPREDUCE-1818 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2275) RaidNode should monitor and fix blocks that violate RAID block placement
RaidNode should monitor and fix blocks that violate RAID block placement - Key: MAPREDUCE-2275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali When files are RAIDed, it is important to keep blocks in each RAID stripe and the corresponding parity blocks on as many different machines as possible. This ensures minimal probability of data loss when data nodes go dead. BlockPlacementPolicyRaid ensures that parity blocks are not located on the same machines as the source blocks. But source blocks placement is not controlled directly in this manner. Instead, source blocks are allowed to be created using the default policy. After a source file is RAIDed, its replication is increased, and then decreased. BlockPlacementPolicyRaid then tries to keep the source blocks well-located when excess blocks are deleted. This is not guaranteed to ensure the correct block placement for RAID. Also, if blocks are moved around by the balancer, the block placement could be violated. We need periodic monitoring of block placement of RAIDed files and the corresponding parity blocks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe
Parallelize reading of blocks within a stripe - Key: MAPREDUCE-2267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2267.patch RAID code has several instances where several blocks of data have to be read to perform an operation. For example, computing a parity block requires reading the blocks of the source file. Similarly, generating a fixed block requires reading a parity block and the good blocks from the source file. These read operations proceed sequentially currently. RAID code should use a thread pool to increase the parallelism and thus reduce latency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe
[ https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2267: --- Attachment: MAPREDUCE-2267.patch Parallelize reading of blocks within a stripe - Key: MAPREDUCE-2267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2267.patch RAID code has several instances where several blocks of data have to be read to perform an operation. For example, computing a parity block requires reading the blocks of the source file. Similarly, generating a fixed block requires reading a parity block and the good blocks from the source file. These read operations proceed sequentially currently. RAID code should use a thread pool to increase the parallelism and thus reduce latency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe
[ https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2267: --- Attachment: MAPREDUCE-2267.1.patch Fixing failures found during tests. Parallelize reading of blocks within a stripe - Key: MAPREDUCE-2267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.patch RAID code has several instances where several blocks of data have to be read to perform an operation. For example, computing a parity block requires reading the blocks of the source file. Similarly, generating a fixed block requires reading a parity block and the good blocks from the source file. These read operations proceed sequentially currently. RAID code should use a thread pool to increase the parallelism and thus reduce latency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2239) BlockPlacementPolicyRaid should call getBlockLocations only when necessary
[ https://issues.apache.org/jira/browse/MAPREDUCE-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981583#action_12981583 ] Ramkumar Vadali commented on MAPREDUCE-2239: Do you need to change FSNamesystem.LOG.debug - FSNamesystem.LOG.info? BlockPlacementPolicyRaid should call getBlockLocations only when necessary -- Key: MAPREDUCE-2239 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2239 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.23.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.23.0 Attachments: MAPREDUCE-2239.txt Currently BlockPlacementPolicyRaid calls getBlockLocations for every chooseTarget(). This puts pressure on NameNode. We should avoid calling if this file is not raided or a parity file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980951#action_12980951 ] Ramkumar Vadali commented on MAPREDUCE-2248: TEST RESULTS {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 524.787 sec [junit] Running org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 154.653 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 944.872 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 13.241 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 17.78 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.293 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.036 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.007 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 178.351 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 646.931 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 253.727 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 21.994 sec [junit] Running org.apache.hadoop.raid.TestRaidShellFsck [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 270.783 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.14 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.769 sec {code} {code} [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] {code} DistributedRaidFileSystem should unraid only the corrupt block -- Key: MAPREDUCE-2248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.23.0 Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. It is better to unraid just the corrupt block and use the rest of the file as normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Status: Open (was: Patch Available) Fix logging in raid code. - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) Fix logging in raid code. - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Attachment: MAPREDUCE-2250.1.patch Update after svn up Fix logging in raid code. - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2250) Fix log levels for error messages
Fix log levels for error messages - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix log levels for error messages
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Attachment: MAPREDUCE-2250.patch Fixes logging Fix log levels for error messages - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Summary: Fix logging in raid code. (was: Fix log levels for error messages) Fix logging in raid code. - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Status: Patch Available (was: Open) Review at https://reviews.apache.org/r/266/ Fix logging in raid code. - Key: MAPREDUCE-2250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Trivial Attachments: MAPREDUCE-2250.patch There are quite a few error messages being logged with a log level of info. That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2248: --- Attachment: MAPREDUCE-2248.1.patch Addressed Scott's comments DistributedRaidFileSystem should unraid only the corrupt block -- Key: MAPREDUCE-2248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. It is better to unraid just the corrupt block and use the rest of the file as normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
DistributedRaidFileSystem should unraid only the corrupt block -- Key: MAPREDUCE-2248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. It is better to unraid just the corrupt block and use the rest of the file as normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2248: --- Attachment: MAPREDUCE-2248.patch review at https://reviews.apache.org/r/217/ DistributedRaidFileSystem should unraid only the corrupt block -- Key: MAPREDUCE-2248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2248.patch DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. It is better to unraid just the corrupt block and use the rest of the file as normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2245) Failure metrics for block fixer
Failure metrics for block fixer --- Key: MAPREDUCE-2245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2245 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor Publish file fixing failure metrics for the block fixer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2246) Timeout for fixing a file
Timeout for fixing a file - Key: MAPREDUCE-2246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2246 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali If the DistBlockFixer takes a long time to to fix a file, it would be better to timeout and try again in a new MR job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2240) DistBlockFixer could sleep indefinitely
DistBlockFixer could sleep indefinitely --- Key: MAPREDUCE-2240 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2240 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali DistributedBlockFixer computes its sleep interval based on the amount of time spent in fixing jobs. This computation has a bug which can result in the sleep interval becoming negative, which would make the distributed block fixer sleep indefinitely -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched
[ https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970534#action_12970534 ] Ramkumar Vadali commented on MAPREDUCE-2214: TEST RESULTS ant test-patch complains about unit-tests, but its difficult to come up with a unit-test for this. {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] {code} ant test: there was only one test failure, but that fails in a clean checkout too. {code} [junit] Test org.apache.hadoop.mapred.TestControlledMapReduceJob FAILED (timeout) {code} TaskTracker should release slot if task is not launched --- Key: MAPREDUCE-2214 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2214.patch TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not in an expected state. However, in the case where the task is not launched, the slot is not released. We have observed this in production - the task was in SUCCEEDED state by the time launchTask() got to it and then the slot was never released. It is not clear how the task got into that state, but it is better to handle the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched
[ https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2214: --- Attachment: MAPREDUCE-2214.patch TaskTracker should release slot if task is not launched --- Key: MAPREDUCE-2214 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2214.patch TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not in an expected state. However, in the case where the task is not launched, the slot is not released. We have observed this in production - the task was in SUCCEEDED state by the time launchTask() got to it and then the slot was never released. It is not clear how the task got into that state, but it is better to handle the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched
[ https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2214: --- Status: Patch Available (was: Open) TaskTracker should release slot if task is not launched --- Key: MAPREDUCE-2214 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2214.patch TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not in an expected state. However, in the case where the task is not launched, the slot is not released. We have observed this in production - the task was in SUCCEEDED state by the time launchTask() got to it and then the slot was never released. It is not clear how the task got into that state, but it is better to handle the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1831) BlockPlacement policy for RAID
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969887#action_12969887 ] Ramkumar Vadali commented on MAPREDUCE-1831: +1 looks good. Please run unit-tests and test-patch. BlockPlacement policy for RAID -- Key: MAPREDUCE-1831 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.23.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.23.0 Attachments: MAPREDUCE-1831-v2.txt, MAPREDUCE-1831.20100610.txt, MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt Raid introduce the new dependency between blocks within a file. The blocks help decode each other. Therefore we should avoid put them on the same machine. The proposed BlockPlacementPolicy does the following 1. When writing parity blocks, it avoid the parity blocks and source blocks sit together. 2. When reducing replication number, it deletes the blocks that sits with other dependent blocks. 3. It does not change the way we write normal files. It only has different behavior when processing raid files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched
TaskTracker should release slot if task is not launched --- Key: MAPREDUCE-2214 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not in an expected state. However, in the case where the task is not launched, the slot is not released. We have observed this in production - the task was in SUCCEEDED state by the time launchTask() got to it and then the slot was never released. It is not clear how the task got into that state, but it is better to handle the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2156) Raid-aware FSCK
[ https://issues.apache.org/jira/browse/MAPREDUCE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965744#action_12965744 ] Ramkumar Vadali commented on MAPREDUCE-2156: +1, looks good. Raid-aware FSCK --- Key: MAPREDUCE-2156 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2156 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.23.0 Reporter: Patrick Kling Assignee: Patrick Kling Fix For: 0.23.0 Attachments: MAPREDUCE-2156.2.patch, MAPREDUCE-2156.patch Currently, FSCK reports files as corrupt even if they can be fixed using parity blocks. We need a tool that only reports files that are irreparably corrupt (i.e., files for which too many data or parity blocks belonging to the same stripe have been lost or corrupted). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2155) RaidNode should optionally dispatch map reduce jobs to fix corrupt blocks (instead of fixing locally)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964904#action_12964904 ] Ramkumar Vadali commented on MAPREDUCE-2155: +1. Latest patch looks good to me. RaidNode should optionally dispatch map reduce jobs to fix corrupt blocks (instead of fixing locally) - Key: MAPREDUCE-2155 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2155 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.23.0 Reporter: Patrick Kling Assignee: Patrick Kling Fix For: 0.23.0 Attachments: MAPREDUCE-2155.2.patch, MAPREDUCE-2155.patch Recomputing blocks based on parity information is expensive. Rather than doing this locally at the RaidNode, we should run map reduce jobs. This will allow us to quickly fix a large number of corrupt or missing blocks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1367) LocalJobRunner should support parallel mapper execution
[ https://issues.apache.org/jira/browse/MAPREDUCE-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965054#action_12965054 ] Ramkumar Vadali commented on MAPREDUCE-1367: @Aaron, Just curious, is this being used in production? If so, could you please outline the use case? LocalJobRunner should support parallel mapper execution --- Key: MAPREDUCE-1367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1367 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-1367.2.patch, MAPREDUCE-1367.3.patch, MAPREDUCE-1367.4.patch, MAPREDUCE-1367.5.patch, MAPREDUCE-1367.6.patch, MAPREDUCE-1367.7.patch, MAPREDUCE-1367.patch The LocalJobRunner currently supports only a single execution thread. Given the prevalence of multi-core CPUs, it makes sense to allow users to run multiple tasks in parallel for improved performance on small (local-only) jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run
[ https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-1783: --- Attachment: MAPREDUCE-1783.patch Patch after svn up Task Initialization should be delayed till when a job can be run Key: MAPREDUCE-1783 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.22.0 Attachments: 0001-Pool-aware-job-initialization.patch, 0001-Pool-aware-job-initialization.patch.1, MAPREDUCE-1783.patch, submit-mapreduce-1783.patch The FairScheduler task scheduler uses PoolManager to impose limits on the number of jobs that can be running at a given time. However, jobs that are submitted are initiaiized immediately by EagerTaskInitializationListener by calling JobInProgress.initTasks. This causes the job split file to be read into memory. The split information is not needed until the number of running jobs is less than the maximum specified. If the amount of split information is large, this leads to unnecessary memory pressure on the Job Tracker. To ease memory pressure, FairScheduler can use another implementation of JobInProgressListener that is aware of PoolManager limits and can delay task initialization until the number of running jobs is below the maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run
[ https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934065#action_12934065 ] Ramkumar Vadali commented on MAPREDUCE-1783: Latest patch TEST RESULTS: One test fails, but that also fails on a clean checkout {code} [junit] Test org.apache.hadoop.mapred.TestControlledMapReduceJob FAILED (timeout) {code} ant test-patch succeeds: {code} [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 13 minutes 6 seconds Test results are in /tmp/rvadali.hadoopQA {code} Task Initialization should be delayed till when a job can be run Key: MAPREDUCE-1783 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Fix For: 0.22.0 Attachments: 0001-Pool-aware-job-initialization.patch, 0001-Pool-aware-job-initialization.patch.1, MAPREDUCE-1783.patch, submit-mapreduce-1783.patch The FairScheduler task scheduler uses PoolManager to impose limits on the number of jobs that can be running at a given time. However, jobs that are submitted are initiaiized immediately by EagerTaskInitializationListener by calling JobInProgress.initTasks. This causes the job split file to be read into memory. The split information is not needed until the number of running jobs is less than the maximum specified. If the amount of split information is large, this leads to unnecessary memory pressure on the Job Tracker. To ease memory pressure, FairScheduler can use another implementation of JobInProgressListener that is aware of PoolManager limits and can delay task initialization until the number of running jobs is below the maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2159) Provide metrics for RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2159: --- Attachment: MAPREDUCE-2159.patch Adding the classes RaidNodeMetrics and TestRaidNodeMetrics Provide metrics for RaidNode Key: MAPREDUCE-2159 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2159 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2159.patch It will be useful to have the following metrics for RAID: - files raided - files too new to be raided - files too small to be raided - number of blocks fixed using raid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2159) Provide metrics for RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2159: --- Status: Patch Available (was: Open) Provide metrics for RaidNode Key: MAPREDUCE-2159 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2159 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2159.patch It will be useful to have the following metrics for RAID: - files raided - files too new to be raided - files too small to be raided - number of blocks fixed using raid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2189) RAID Parallel traversal needs to synchronize stats
RAID Parallel traversal needs to synchronize stats -- Key: MAPREDUCE-2189 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2189 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali The implementation of multi-threaded directory traversal does not update stats in a thread-safe manner -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2189) RAID Parallel traversal needs to synchronize stats
[ https://issues.apache.org/jira/browse/MAPREDUCE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2189: --- Attachment: MAPREDUCE-2189.patch RAID Parallel traversal needs to synchronize stats -- Key: MAPREDUCE-2189 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2189 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2189.patch The implementation of multi-threaded directory traversal does not update stats in a thread-safe manner -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API
Port DistRaid.java to new mapreduce API --- Key: MAPREDUCE-2184 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali DistRaid.java was implemented with the older mapred API, this task is for porting it to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API
[ https://issues.apache.org/jira/browse/MAPREDUCE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2184: --- Component/s: contrib/raid Port DistRaid.java to new mapreduce API --- Key: MAPREDUCE-2184 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali DistRaid.java was implemented with the older mapred API, this task is for porting it to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()
DistributedRaidFileSystem should implement getFileBlockLocations() -- Key: MAPREDUCE-2186 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali If a RAIDed file has missing blocks, DistributedRaidFileSystem.getFileBlockLocations() would return no block locations. This could lead a client to believe that the file is not readable. But if parity data is available, the file actually is readable. It would be better to implement getFileBlockLocations() and return the location of the parity blocks that would be needed to reconstruct the missing block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API
[ https://issues.apache.org/jira/browse/MAPREDUCE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2184: --- Status: Patch Available (was: Open) Port DistRaid.java to new mapreduce API --- Key: MAPREDUCE-2184 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2184.patch DistRaid.java was implemented with the older mapred API, this task is for porting it to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API
[ https://issues.apache.org/jira/browse/MAPREDUCE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2184: --- Attachment: MAPREDUCE-2184.patch Review at https://reviews.apache.org/r/87/ TEST RESULTS: ant test: {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 377.746 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 133.511 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 11.485 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.063 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.396 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.265 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 68.93 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 455.365 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 215.837 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.015 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 14.912 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.207 sec test: BUILD SUCCESSFUL Total time: 22 minutes 41 seconds {code} ant test-patch: The errors are the same a clean trunk checkout. {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 13 new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] {code} Port DistRaid.java to new mapreduce API --- Key: MAPREDUCE-2184 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2184.patch DistRaid.java was implemented with the older mapred API, this task is for porting it to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2169) Integrated Reed-Solomon code with RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2169: --- Attachment: MAPREDUCE-2169.2.patch TEST RESULTS: ant test under raid: {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 373.594 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 138.885 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 15.061 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.491 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.39 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.809 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.229 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 461.174 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 218.163 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 24.31 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 14.96 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.368 sec test: BUILD SUCCESSFUL Total time: 22 minutes 53 seconds ant test-patch has the same result as a clean checkout (see MAPREDUCE-2176) {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 28 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 13 new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] {code} Integrated Reed-Solomon code with RaidNode -- Key: MAPREDUCE-2169 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2169 Project: Hadoop Map/Reduce Issue Type: Task Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2169.2.patch, MAPREDUCE-2169.patch Scott Chen recently checked in an implementation of the Reed Solomon code. This task will track the integration of the code with RaidNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node
[ https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2167: --- Attachment: MAPREDUCE-2167.4.patch Fixed a broken test. TEST RESULTS: ant test-patch has the same number of failures as a clean checkout {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 13 new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] {code} ant test succeeds: {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 47.071 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 124.583 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.337 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.481 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.392 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.485 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 71.136 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 471.072 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.828 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec test: BUILD SUCCESSFUL Total time: 15 minutes 6 seconds {code} Faster directory traversal for raid node Key: MAPREDUCE-2167 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.3.patch, MAPREDUCE-2167.4.patch, MAPREDUCE-2167.patch The RaidNode currently iterates over the directory structure to figure out which files to RAID. With millions of files, this can take a long time - especially if some files are already RAIDed and the RaidNode needs to look at parity files / parity file HARs to determine if the file needs to be RAIDed. The directory traversal is encapsulated inside the class DirectoryTraversal, which examines one file at a time, using the caller's thread. My proposal is to make this multi-threaded as follows: * use a pool of threads inside DirectoryTraversal * The caller's thread is used to retrieve directories, and each new directory is assigned to a thread in the pool. The worker thread examines all the files the directory. * If there sub-directories, those are added back as workitems to the pool. Comments? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node
[ https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2167: --- Attachment: MAPREDUCE-2167.3.patch Added a comment explaining the use of the slots semaphore. Faster directory traversal for raid node Key: MAPREDUCE-2167 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.3.patch, MAPREDUCE-2167.patch The RaidNode currently iterates over the directory structure to figure out which files to RAID. With millions of files, this can take a long time - especially if some files are already RAIDed and the RaidNode needs to look at parity files / parity file HARs to determine if the file needs to be RAIDed. The directory traversal is encapsulated inside the class DirectoryTraversal, which examines one file at a time, using the caller's thread. My proposal is to make this multi-threaded as follows: * use a pool of threads inside DirectoryTraversal * The caller's thread is used to retrieve directories, and each new directory is assigned to a thread in the pool. The worker thread examines all the files the directory. * If there sub-directories, those are added back as workitems to the pool. Comments? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-2179) RaidBlockSender.java compilation fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali reassigned MAPREDUCE-2179: -- Assignee: Ramkumar Vadali RaidBlockSender.java compilation fails -- Key: MAPREDUCE-2179 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Giridharan Kesavan Assignee: Ramkumar Vadali Priority: Blocker https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull Mapreduce trunk compilation is broken with compile: [echo] contrib: raid [javac] Compiling 27 source files to /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] private BlockTransferThrottler throttler; [javac] ^ [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] BlockTransferThrottler throttler) throws IOException { [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2179) RaidBlockSender.java compilation fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2179: --- Status: Patch Available (was: Open) RaidBlockSender.java compilation fails -- Key: MAPREDUCE-2179 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Giridharan Kesavan Assignee: Ramkumar Vadali Priority: Blocker Attachments: MAPREDUCE-2179.patch https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull Mapreduce trunk compilation is broken with compile: [echo] contrib: raid [javac] Compiling 27 source files to /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] private BlockTransferThrottler throttler; [javac] ^ [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] BlockTransferThrottler throttler) throws IOException { [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2179) RaidBlockSender.java compilation fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2179: --- Attachment: MAPREDUCE-2179.patch r1032836 (HDFS-1457) removed the class BlockTransferThrottler. The RAID code does not need that functionality, so this patch just removes the dependence on BlockTransferThrottler. RaidBlockSender.java compilation fails -- Key: MAPREDUCE-2179 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Giridharan Kesavan Assignee: Ramkumar Vadali Priority: Blocker Attachments: MAPREDUCE-2179.patch https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull Mapreduce trunk compilation is broken with compile: [echo] contrib: raid [javac] Compiling 27 source files to /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] private BlockTransferThrottler throttler; [javac] ^ [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] BlockTransferThrottler throttler) throws IOException { [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2179) RaidBlockSender.java compilation fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929995#action_12929995 ] Ramkumar Vadali commented on MAPREDUCE-2179: ant test-patch will not run since trunk compilation is broken. I have run raid unit-tests: {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 41.64 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 139.487 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.169 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 26.334 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.399 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.051 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.473 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 70.16 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 406.19 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.977 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 26.168 sec test: BUILD SUCCESSFUL Total time: 14 minutes 12 seconds {code} RaidBlockSender.java compilation fails -- Key: MAPREDUCE-2179 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Giridharan Kesavan Assignee: Ramkumar Vadali Priority: Blocker Attachments: MAPREDUCE-2179.patch https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull Mapreduce trunk compilation is broken with compile: [echo] contrib: raid [javac] Compiling 27 source files to /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] private BlockTransferThrottler throttler; [javac] ^ [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377: cannot find symbol [javac] symbol : class BlockTransferThrottler [javac] location: class org.apache.hadoop.hdfs.server.datanode.RaidBlockSender [javac] BlockTransferThrottler throttler) throws IOException { [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1704) Parity files that are outdated or nonexistent should be immediately disregarded
[ https://issues.apache.org/jira/browse/MAPREDUCE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929239#action_12929239 ] Ramkumar Vadali commented on MAPREDUCE-1704: This is not an issue anymore Parity files that are outdated or nonexistent should be immediately disregarded --- Key: MAPREDUCE-1704 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1704 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Scott Chen Fix For: 0.22.0 In the current implementation, old or nonexistent parity files are not immediately disregarded. Absence will trigger exceptions, but old files could lead to bad recoveries and maybe data corruption. This should be fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1706) Log RAID recoveries on HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929242#action_12929242 ] Ramkumar Vadali commented on MAPREDUCE-1706: This looks good to me. +1 Log RAID recoveries on HDFS --- Key: MAPREDUCE-1706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1706 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Rodrigo Schmidt Assignee: Scott Chen Attachments: MAPREDUCE-1706.txt It would be good to have a way to centralize all the recovery logs, since recovery can be executed by any hdfs client. The best place to store this information is HDFS itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-2176) ant test-patch failing on a clean checkout
[ https://issues.apache.org/jira/browse/MAPREDUCE-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali resolved MAPREDUCE-2176. Resolution: Duplicate Dup of MAPREDUCE-2172 ant test-patch failing on a clean checkout -- Key: MAPREDUCE-2176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2176 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ramkumar Vadali ant test-patch fails for a dummy patch on CHANGES.txt: {code} [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 13 new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /data/users/rvadali/apache/hadoop-mapred-trunk/build.xml:1740: exec returned: 3 Total time: 13 minutes 14 seconds Test results are in /tmp/rvadali.hadoopQA [rvad...@dev502 hadoop-mapred-trunk]$ svn st ? build-fi ? SecurityAuth.audit ? lib/jdiff/hadoop-mapred_0.22.0-SNAPSHOT.xml M CHANGES.txt X src/test/bin {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2172) test-patch.properties contains incorrect/version-dependent values of OK_FINDBUGS_WARNINGS and OK_RELEASEAUDIT_WARNINGS
[ https://issues.apache.org/jira/browse/MAPREDUCE-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928728#action_12928728 ] Ramkumar Vadali commented on MAPREDUCE-2172: @Nigel, both Patrick and I see ant test-patch fail on a clean checkout. We think this is related to HADOOP-7008. Is there some configuration we should change before running ant test-patch? test-patch.properties contains incorrect/version-dependent values of OK_FINDBUGS_WARNINGS and OK_RELEASEAUDIT_WARNINGS -- Key: MAPREDUCE-2172 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2172 Project: Hadoop Map/Reduce Issue Type: Bug Environment: FindBugs 1.3.4 Reporter: Patrick Kling Running ant test-patch with an empty patch yields 25 findbugs warning and 3 release audit warnings (rather than the 0 findbugs warnings and 1 release audit warning specified in test-patch.properties): {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 25 new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node
[ https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2167: --- Attachment: MAPREDUCE-2167.2.patch Using a semaphore now to track the active threads. The logic is much simpler now. Faster directory traversal for raid node Key: MAPREDUCE-2167 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.patch The RaidNode currently iterates over the directory structure to figure out which files to RAID. With millions of files, this can take a long time - especially if some files are already RAIDed and the RaidNode needs to look at parity files / parity file HARs to determine if the file needs to be RAIDed. The directory traversal is encapsulated inside the class DirectoryTraversal, which examines one file at a time, using the caller's thread. My proposal is to make this multi-threaded as follows: * use a pool of threads inside DirectoryTraversal * The caller's thread is used to retrieve directories, and each new directory is assigned to a thread in the pool. The worker thread examines all the files the directory. * If there sub-directories, those are added back as workitems to the pool. Comments? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.