[jira] [Created] (MAPREDUCE-2570) Bug in RAID FS (DistributedRaidFileSystem) unraid path

2011-06-06 Thread Ramkumar Vadali (JIRA)
Bug in RAID FS (DistributedRaidFileSystem) unraid path
--

 Key: MAPREDUCE-2570
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2570
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


The un-raid path in DistributedRaidFileSystem goes through 
RaidNode.unRaidCorruptBlock(), which has a bug when the parity file is inside a 
HAR. The temporary file that contains the recovered block contents is created 
in the filesystem that hosts the parity file. In case the parity file is inside 
a HAR, its filesystem is HarFileSystem, which is read-only. In this case the 
temporary file creation will fail. The fix is a one-line change to use the 
underlying filesystem of the HAR.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()

2011-05-23 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038086#comment-13038086
 ] 

Ramkumar Vadali commented on MAPREDUCE-2186:


The main motivation to open this jira was to allow CombineFileInputFormat to 
work when there are missing blocks. CombineFileInputFormat figures out the 
host/rack information for input blocks and uses that information to create 
input splits. It does not handle the case where a block does not have any 
host/rack information.

The proposed fix to return the location of parity blocks in the case where 
source blocks are missing is not good because it is fixing the problem in the 
wrong place. It also causes us to get false locality. 
Instead of changing RAID FS to handle this case, its better to fix CFIF to 
handle the case when there are missing blocks (MAPREDUCE-2185)

 DistributedRaidFileSystem should implement getFileBlockLocations()
 --

 Key: MAPREDUCE-2186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 If a RAIDed file has missing blocks, 
 DistributedRaidFileSystem.getFileBlockLocations() would return no block 
 locations. This could lead a client to believe that the file is not readable. 
 But if parity data is available, the file actually is readable.
 It would be better to implement getFileBlockLocations() and return the 
 location of the parity blocks that would be needed to reconstruct the missing 
 block.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali resolved MAPREDUCE-2186.


Resolution: Won't Fix

Better to fix MAPREDUCE-2185

 DistributedRaidFileSystem should implement getFileBlockLocations()
 --

 Key: MAPREDUCE-2186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 If a RAIDed file has missing blocks, 
 DistributedRaidFileSystem.getFileBlockLocations() would return no block 
 locations. This could lead a client to believe that the file is not readable. 
 But if parity data is available, the file actually is readable.
 It would be better to implement getFileBlockLocations() and return the 
 location of the parity blocks that would be needed to reconstruct the missing 
 block.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-2498) TestRaidShellFsck failing on trunk

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali reassigned MAPREDUCE-2498:
--

Assignee: Ramkumar Vadali  (was: Todd Lipcon)

 TestRaidShellFsck failing on trunk
 --

 Key: MAPREDUCE-2498
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2498
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Critical
 Fix For: 0.23.0

 Attachments: mapreduce-2498.txt


 TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2 has been failing the 
 last several builds:
 Error Message: parity file not HARed after 40s
 java.io.IOException: parity file not HARed after 40s
at 
 org.apache.hadoop.raid.TestRaidShellFsck.raidTestFiles(TestRaidShellFsck.java:281)
at 
 org.apache.hadoop.raid.TestRaidShellFsck.setUp(TestRaidShellFsck.java:181)
at 
 org.apache.hadoop.raid.TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2(TestRaidShellFsck.java:666)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2185) Infinite loop at creating splits using CombineFileInputFormat

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2185:
---

Attachment: MAPREDUCE-2185.patch

For blocks that do not have hosts associated with them, use 
NetworkTopology.DEFAULT_RACK as the rack location. This avoids the infinite 
loop later on in getMoreSplits()

 Infinite loop at creating splits using CombineFileInputFormat
 -

 Key: MAPREDUCE-2185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2185
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: MAPREDUCE-2185.patch


 This is caused by a missing block in HDFS. So the block's locations are 
 empty. The following code adds the block to blockToNodes map but not to 
 rackToBlocks map. Later on when generating splits, only blocks in 
 rackToBlocks are removed from blockToNodes map. So blockToNodes map can never 
 become empty therefore causing infinite loop
 {code}
   // add this block to the block -- node locations map
   blockToNodes.put(oneblock, oneblock.hosts);
   // add this block to the rack -- block map
   for (int j = 0; j  oneblock.racks.length; j++) {
  ..
   }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2185) Infinite loop at creating splits using CombineFileInputFormat

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2185:
---

Assignee: Ramkumar Vadali  (was: Hairong Kuang)
  Status: Patch Available  (was: Open)

 Infinite loop at creating splits using CombineFileInputFormat
 -

 Key: MAPREDUCE-2185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2185
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Reporter: Hairong Kuang
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2185.patch


 This is caused by a missing block in HDFS. So the block's locations are 
 empty. The following code adds the block to blockToNodes map but not to 
 rackToBlocks map. Later on when generating splits, only blocks in 
 rackToBlocks are removed from blockToNodes map. So blockToNodes map can never 
 become empty therefore causing infinite loop
 {code}
   // add this block to the block -- node locations map
   blockToNodes.put(oneblock, oneblock.hosts);
   // add this block to the rack -- block map
   for (int j = 0; j  oneblock.racks.length; j++) {
  ..
   }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2482) Enable RAID contrib in trunk

2011-05-10 Thread Ramkumar Vadali (JIRA)
Enable RAID contrib in trunk


 Key: MAPREDUCE-2482
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2482
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


The RAID contrib project can be re-enabled since federation related changes are 
now in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-2482) Enable RAID contrib in trunk

2011-05-10 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali resolved MAPREDUCE-2482.


Resolution: Duplicate

Duplicate of MAPREDUCE-2467. For some reason I thought MAPREDUCE-2467 was 
committed.

 Enable RAID contrib in trunk
 

 Key: MAPREDUCE-2482
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2482
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 The RAID contrib project can be re-enabled since federation related changes 
 are now in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2467) HDFS-1052 changes break the raid contrib module in MapReduce

2011-05-10 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031357#comment-13031357
 ] 

Ramkumar Vadali commented on MAPREDUCE-2467:


Hi Suresh

Sorry for the delay in responding. I think the test failures are unrelated
 1. testConcurrentJobs is failing because one file is not detected as corrupt 
and so block fixer does not fix it. This seems to be intermittent, since I 
tried running the test and it succeeded twice. I dont think this is related to 
federation. Perhaps we can track it separately?
 2. testFileBlockAndParityBlockMissingHar2 failed because of insufficient heap 
space when running a HAR job through LocalJobRunner. Again, unrelated to 
federation
 3. testJobQueues - Failed because of a timeout. Also, RAID changes cannot 
affect core mapred tests, so this must be unrelated.

 HDFS-1052 changes break the raid contrib module in MapReduce
 

 Key: MAPREDUCE-2467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: MR-2467.1.patch, MR-2467.2.patch, MR-2467.3.patch, 
 MR-2467.patch


 Raid contrib module requires changes to work with the federation changes made 
 in HDFS-1052.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2467) HDFS-1052 changes break the raid contrib module in MapReduce

2011-05-03 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028502#comment-13028502
 ] 

Ramkumar Vadali commented on MAPREDUCE-2467:


+1 looks good

 HDFS-1052 changes break the raid contrib module in MapReduce
 

 Key: MAPREDUCE-2467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: MR-2467.1.patch, MR-2467.patch


 Raid contrib module requires changes to work with the federation changes made 
 in HDFS-1052.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2465) HDFS raid not compiling after federation merge

2011-05-03 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028504#comment-13028504
 ] 

Ramkumar Vadali commented on MAPREDUCE-2465:


Suresh, the patch for MAPREDUCE-2467 looks good.

 HDFS raid not compiling after federation merge
 --

 Key: MAPREDUCE-2465
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2465
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Blocker
 Attachments: disable-raid-compilation.txt, failure.txt, 
 fix-compile-but-raid-broken.txt


 The RAID contrib is no longer compiling now that federation has been merged, 
 due to some API changes in LocatedBlock and FSDataset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2467) HDFS-1052 changes break the raid contrib module in MapReduce

2011-05-03 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028503#comment-13028503
 ] 

Ramkumar Vadali commented on MAPREDUCE-2467:


Thanks for making the changes!

 HDFS-1052 changes break the raid contrib module in MapReduce
 

 Key: MAPREDUCE-2467
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2467
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: MR-2467.1.patch, MR-2467.patch


 Raid contrib module requires changes to work with the federation changes made 
 in HDFS-1052.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2465) HDFS raid not compiling after federation merge

2011-05-02 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027941#comment-13027941
 ] 

Ramkumar Vadali commented on MAPREDUCE-2465:


I will work on making this compile

 HDFS raid not compiling after federation merge
 --

 Key: MAPREDUCE-2465
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2465
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Blocker
 Attachments: disable-raid-compilation.txt, failure.txt, 
 fix-compile-but-raid-broken.txt


 The RAID contrib is no longer compiling now that federation has been merged, 
 due to some API changes in LocatedBlock and FSDataset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2436) RAID block fixer should prioritize block fix operations

2011-04-13 Thread Ramkumar Vadali (JIRA)
RAID block fixer should prioritize block fix operations
---

 Key: MAPREDUCE-2436
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2436
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.20.2, 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


The RAID block fixer submits mapreduce jobs to fix corrupt files. This is OK 
for XOR RAID, but with Reed-Solomon RAID, there can be large number of corrupt 
files when even a single datanode goes dead. With Reed-SOlomon RAID, it is 
better to categorize corrupt files based on urgency. Files with only one 
corrupt block can be treated as lower priority than those with more number of 
corrupt blocks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2395) TestBlockFixer timing out on trunk

2011-03-22 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009779#comment-13009779
 ] 

Ramkumar Vadali commented on MAPREDUCE-2395:


Yes, I saw that but could not reproduce it. Also, it is weird since this patch 
has only test code changes.

 TestBlockFixer timing out on trunk
 --

 Key: MAPREDUCE-2395
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Critical
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2395.patch


 In recent Hudson builds, TestBlockFixer has been timing out. Not clear how 
 long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from 
 Hudson's test result parsing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2368) RAID DFS regression

2011-03-18 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2368:
---

Status: Open  (was: Patch Available)

Will resubmit patch now that MiniMRCluster delays are resolved.

 RAID DFS regression
 ---

 Key: MAPREDUCE-2368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.20.3

 Attachments: MAPREDUCE-2368.patch


 The patch for MAPREDUCE-2248 did not handle zero-length files correctly, 
 which leads to ArrayIndexOutOfBoundsException when opening a zero-length 
 file. That case needs special handling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2368) RAID DFS regression

2011-03-18 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2368:
---

Status: Patch Available  (was: Open)

 RAID DFS regression
 ---

 Key: MAPREDUCE-2368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.20.3

 Attachments: MAPREDUCE-2368.patch


 The patch for MAPREDUCE-2248 did not handle zero-length files correctly, 
 which leads to ArrayIndexOutOfBoundsException when opening a zero-length 
 file. That case needs special handling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Assigned: (MAPREDUCE-2395) TestBlockFixer timing out on trunk

2011-03-17 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali reassigned MAPREDUCE-2395:
--

Assignee: Ramkumar Vadali

 TestBlockFixer timing out on trunk
 --

 Key: MAPREDUCE-2395
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Critical
 Fix For: 0.23.0


 In recent Hudson builds, TestBlockFixer has been timing out. Not clear how 
 long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from 
 Hudson's test result parsing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2395) TestBlockFixer timing out on trunk

2011-03-17 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2395:
---

Attachment: MAPREDUCE-2395.patch

Breaks TestBlockFixer into several tests. The file TestBlockFixer.java now has 
tests that do not use a MiniMRCluster. The other TestBlockFixer*.java files 
have a few tests each that use MiniMRCluster.

 TestBlockFixer timing out on trunk
 --

 Key: MAPREDUCE-2395
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Critical
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2395.patch


 In recent Hudson builds, TestBlockFixer has been timing out. Not clear how 
 long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from 
 Hudson's test result parsing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2395) TestBlockFixer timing out on trunk

2011-03-17 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2395:
---

Status: Patch Available  (was: Open)

 TestBlockFixer timing out on trunk
 --

 Key: MAPREDUCE-2395
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2395
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Critical
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2395.patch


 In recent Hudson builds, TestBlockFixer has been timing out. Not clear how 
 long it has been broken since MAPREDUCE-2394 was hiding the RAID tests from 
 Hudson's test result parsing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (MAPREDUCE-2368) RAID DFS regression

2011-03-08 Thread Ramkumar Vadali (JIRA)
RAID DFS regression
---

 Key: MAPREDUCE-2368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.20.3


The patch for MAPREDUCE-2248 did not handle zero-length files correctly, which 
leads to ArrayIndexOutOfBoundsException when opening a zero-length file. That 
case needs special handling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2368) RAID DFS regression

2011-03-08 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2368:
---

Attachment: MAPREDUCE-2368.patch

This patch handles the len == 0 case.

 RAID DFS regression
 ---

 Key: MAPREDUCE-2368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.20.3

 Attachments: MAPREDUCE-2368.patch


 The patch for MAPREDUCE-2248 did not handle zero-length files correctly, 
 which leads to ArrayIndexOutOfBoundsException when opening a zero-length 
 file. That case needs special handling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2368) RAID DFS regression

2011-03-08 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004133#comment-13004133
 ] 

Ramkumar Vadali commented on MAPREDUCE-2368:


The failed contrib tests were in mumak and raid. I will take a look at the raid 
test failure.
 [exec] [junit] Test org.apache.hadoop.mapred.TestSimulatorEndToEnd 
FAILED (timeout)
 [exec] [junit] Test 
org.apache.hadoop.mapred.TestSimulatorSerialJobSubmission FAILED (timeout)
 [exec] [junit] Test 
org.apache.hadoop.mapred.TestSimulatorStressJobSubmission FAILED (timeout)
 [exec] [junit] Test org.apache.hadoop.raid.TestBlockFixer FAILED 
(timeout)


 RAID DFS regression
 ---

 Key: MAPREDUCE-2368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.20.3

 Attachments: MAPREDUCE-2368.patch


 The patch for MAPREDUCE-2248 did not handle zero-length files correctly, 
 which leads to ArrayIndexOutOfBoundsException when opening a zero-length 
 file. That case needs special handling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2239) BlockPlacementPolicyRaid should call getBlockLocations only when necessary

2011-03-03 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002493#comment-13002493
 ] 

Ramkumar Vadali commented on MAPREDUCE-2239:


+1
Patch looks good

 BlockPlacementPolicyRaid should call getBlockLocations only when necessary
 --

 Key: MAPREDUCE-2239
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2239
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2239-1.txt, MAPREDUCE-2239-2.txt, 
 MAPREDUCE-2239-3.txt, MAPREDUCE-2239.txt


 Currently BlockPlacementPolicyRaid calls getBlockLocations for every 
 chooseTarget().
 This puts pressure on NameNode. We should avoid calling if this file is not 
 raided or a parity file.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2347) RAID blockfixer should check file blocks after the file is fixed

2011-02-28 Thread Ramkumar Vadali (JIRA)
RAID blockfixer should check file blocks after the file is fixed


 Key: MAPREDUCE-2347
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2347
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.20.2
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


After a file is fixed by the block fixer, all its blocks should be checked for 
the presence of replicas. If any block still is missing valid replicas, it 
should be fixed again

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2333) RAID jobs should delete temporary files in the event of filesystem failures

2011-02-16 Thread Ramkumar Vadali (JIRA)
RAID jobs should delete temporary files in the event of filesystem failures
---

 Key: MAPREDUCE-2333
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2333
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor


If the creation of a parity file or parity file HAR fails due to a filesystem 
level error, RAID should delete the temporary files. Specifically, datanode 
death during parity file creation would cause FSDataOutputStream.close() to 
throw an IOException. The RAID code should delete such a file.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2329) RAID BlockFixer should exclude temporary files

2011-02-15 Thread Ramkumar Vadali (JIRA)
RAID BlockFixer should exclude temporary files
--

 Key: MAPREDUCE-2329
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2329
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor


RAID BlockFixer should exclude files matching the pattern ^/tmp/.*

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2329) RAID BlockFixer should exclude temporary files

2011-02-15 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2329:
---

Component/s: contrib/raid

 RAID BlockFixer should exclude temporary files
 --

 Key: MAPREDUCE-2329
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2329
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.20.2, 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor

 RAID BlockFixer should exclude files matching the pattern ^/tmp/.*

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2320) RAID DistBlockFixer should limit pending jobs instead of pending files

2011-02-11 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2320:
---

  Component/s: contrib/raid
Affects Version/s: 0.20.3
   0.20.2
   Issue Type: Improvement  (was: Bug)

 RAID DistBlockFixer should limit pending jobs instead of pending files
 --

 Key: MAPREDUCE-2320
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2320
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.20.2, 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor

 DistBlockFixer limits the number of files being fixed simultaneously to avoid 
 an unlimited backlog. This limits the number of parallel jobs though, and if 
 one job has a long running task, it prevents newer jobs being started. 
 Instead, it should have a limit on running jobs. That way, one long running 
 task will not block other jobs.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2320) RAID DistBlockFixer should limit pending jobs instead of pending files

2011-02-11 Thread Ramkumar Vadali (JIRA)
RAID DistBlockFixer should limit pending jobs instead of pending files
--

 Key: MAPREDUCE-2320
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2320
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor


DistBlockFixer limits the number of files being fixed simultaneously to avoid 
an unlimited backlog. This limits the number of parallel jobs though, and if 
one job has a long running task, it prevents newer jobs being started. Instead, 
it should have a limit on running jobs. That way, one long running task will 
not block other jobs.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2313) RAID code does not close some opened streams

2011-02-09 Thread Ramkumar Vadali (JIRA)
RAID code does not close some opened streams


 Key: MAPREDUCE-2313
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2313
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


There are some instances where opened streams are not closed, leading to a file 
descriptor leak.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2312) Better error handling in RaidShell

2011-02-08 Thread Ramkumar Vadali (JIRA)
Better error handling in RaidShell
--

 Key: MAPREDUCE-2312
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2312
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor


If there is an error trying to find the parity information for a corrupt file, 
RaidShell should print it as corrupt, instead of bailing.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2303) RAID BlockFixer should choose targets better

2011-02-04 Thread Ramkumar Vadali (JIRA)
RAID BlockFixer should choose targets better


 Key: MAPREDUCE-2303
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2303
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


The RAID BlockFixer chooses the destination of the generated block at random. 
It avoids nodes that have a corrupt replica of the block, but does not do 
anything beyond that. It needs to avoid data nodes that have a replica of any 
source or parity block in the block's stripe.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2011-02-03 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2267:
---

Status: Open  (was: Patch Available)

Will upload another patch

 Parallelize reading of blocks within a stripe
 -

 Key: MAPREDUCE-2267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, 
 MAPREDUCE-2267.3.patch, MAPREDUCE-2267.4.patch, MAPREDUCE-2267.patch


 RAID code has several instances where several blocks of data have to be read 
 to perform an operation. For example, computing a parity block requires 
 reading the blocks of the source file. Similarly, generating a fixed block 
 requires reading a parity block and the good blocks from the source file. 
 These read operations proceed sequentially currently. RAID code should use a 
 thread pool to increase the parallelism and thus reduce latency.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2285) MiniMRCluster does not start after ant test-patch

2011-01-28 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988220#action_12988220
 ] 

Ramkumar Vadali commented on MAPREDUCE-2285:


The patch fixes the problem. I am no ivy expert, but it looks good to me.

 MiniMRCluster does not start after ant test-patch
 -

 Key: MAPREDUCE-2285
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2285
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Ramkumar Vadali
Priority: Blocker
 Attachments: cp-bad, cp-good, fix-build.diff


 Any test using MiniMRCluster hangs in the MiniMRCluster constructor after 
 running ant test-patch. Steps to reproduce:
  1. ant -Dpatch.file=dummy patch to CHANGES.txt  -Dforrest.home=path to 
 forrest -Dfindbugs.home=path to findbugs -Dscratch.dir=/tmp/testpatch  
 -Djava5.home=path to java5 test-patch
  2. Run any test that creates MiniMRCluster, say ant test 
 -Dtestcase=TestFileArgs (contrib/streaming)
 Expected result: Test should succeed
 Actual result: Test hangs  in MiniMRCluster.init. This does not happen if 
 we run ant clean after ant test-patch
 Test output:
 {code}
 [junit] 11/01/27 12:11:43 INFO ipc.Server: IPC Server handler 3 on 58675: 
 starting
 [junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: TaskTracker up at: 
 localhost.localdomain/127.0.0.1:58675
 [junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: Starting tracker 
 tracker_host0.foo.com:localhost.localdomain/127.0.0.1:58675
 [junit] 11/01/27 12:11:44 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 0 time(s).
 [junit] 11/01/27 12:11:45 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 1 time(s).
 [junit] 11/01/27 12:11:46 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 2 time(s).
 [junit] 11/01/27 12:11:47 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 3 time(s).
 [junit] 11/01/27 12:11:48 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 4 time(s).
 [junit] 11/01/27 12:11:49 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 5 time(s).
 [junit] 11/01/27 12:11:50 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 6 time(s).
 [junit] 11/01/27 12:11:51 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 7 time(s).
 [junit] 11/01/27 12:11:52 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 8 time(s).
 [junit] 11/01/27 12:11:53 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:0. Already tried 9 time(s).
 [junit] 11/01/27 12:11:53 INFO ipc.RPC: Server at localhost/127.0.0.1:0 
 not available yet, Z...
 {code}
 Stack trace: 
 {code}
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:611)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:429)
 - locked 0x7f3b8dc08700 (a 
 org.apache.hadoop.ipc.Client$Connection)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:504)
 - locked 0x7f3b8dc08700 (a 
 org.apache.hadoop.ipc.Client$Connection)
 at 
 org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:206)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164)
 at org.apache.hadoop.ipc.Client.call(Client.java:1008)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
 at org.apache.hadoop.mapred.$Proxy11.getProtocolVersion(Unknown 
 Source)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:235)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:275)
 at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:206)
 at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:185)
 at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:169)
 at org.apache.hadoop.mapred.TaskTracker$2.run(TaskTracker.java:699)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1142)
 at 
 org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:695)
 - locked 0x7f3b8ccc3870 (a org.apache.hadoop.mapred.TaskTracker)
 at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1391)
 at 
 org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.createTaskTracker(MiniMRCluster.java:219)
  

[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster

2011-01-27 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987669#action_12987669
 ] 

Ramkumar Vadali commented on MAPREDUCE-2283:


Update:

If I run ant clean from the top-level and run `ant test 
-Dtestcase=TestBlockFixer`, it runs fine.
But if I run ant test-patch from the top level and run it again, it gets stuck. 
I ran with test.output=yes to see what was going on, and found this:

{code}
[junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: TaskTracker up at: 
localhost.localdomain/127.0.0.1:50197
[junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: Starting tracker 
tracker_host0.foo.com:localhost.localdomain/127.0.0.1:50197
[junit] 11/01/27 09:21:25 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 0 time(s).
[junit] 11/01/27 09:21:26 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 1 time(s).
[junit] 11/01/27 09:21:27 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 2 time(s).
[junit] 11/01/27 09:21:28 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 3 time(s).
[junit] 11/01/27 09:21:29 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 4 time(s).
[junit] 11/01/27 09:21:30 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 5 time(s).
[junit] 11/01/27 09:21:31 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 6 time(s).
[junit] 11/01/27 09:21:32 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 7 time(s).
[junit] 11/01/27 09:21:33 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 8 time(s).
[junit] 11/01/27 09:21:34 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 9 time(s).
[junit] 11/01/27 09:21:34 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not 
available yet, Z...
{code}

I think hudson does something like this, and ant test-patch is somehow pulling 
in a jar that prevents MiniMRCluster from starting. To check, I wrote a simple 
test that only tries to start a MiniMRCluster:

{code}
public class TestStuckMiniMR extends TestCase {
  public static final int NUM_DATANODES = 3;
  Configuration conf;
  String namenode = null;
  MiniDFSCluster dfs = null;
  MiniMRCluster mr = null;
  String jobTrackerName = null;
  FileSystem fileSys = null;
  protected void setUp() throws Exception {

conf = new Configuration();

dfs = new MiniDFSCluster(conf, NUM_DATANODES, true, null);
dfs.waitActive();
fileSys = dfs.getFileSystem();
namenode = fileSys.getUri().toString();

FileSystem.setDefaultUri(conf, namenode);
mr = new MiniMRCluster(4, namenode, 3);
jobTrackerName = localhost: + mr.getJobTrackerPort();
  }

  protected void tearDown() {
dfs.shutdown();
mr.shutdown();
  }

  public void testStuck() throws Exception {
System.out.println(Done);
  }
}
{code}
This also gets stuck in setup. So I think the problem is outside RAID. Infact, 
just after I tried this, I tried running a test under contrib/streaming. That 
also gets stuck the same way.

{code}
ant test -Dtestcase=TestFileArgs -Dtest.output=yes
{code}

The output:

{code}
[junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: TaskTracker up at: 
localhost.localdomain/127.0.0.1:59339
[junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: Starting tracker 
tracker_host0.foo.com:localhost.localdomain/127.0.0.1:59339
[junit] 11/01/27 09:42:11 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 0 time(s).
[junit] 11/01/27 09:42:12 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 1 time(s).
[junit] 11/01/27 09:42:13 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 2 time(s).
{code}

Can someone try killing TestBlockFixer and run TestFileArgs on the machine 
thats running hudson?

 TestBlockFixer hangs initializing MiniMRCluster
 ---

 Key: MAPREDUCE-2283
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0


 TestBlockFixer (a raid contrib test) is hanging the precommit testing on 
 Hudson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster

2011-01-27 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2283:
---

Attachment: MAPREDUCE-2283.patch

This enables a timeout for RAID tests. This does not fix the MiniMRCluster 
problem though.
{code}
test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.hadoop.raid.TestBlockFixer FAILED (timeout)

BUILD FAILED
/data/users/rvadali/apache/hadoop-mapred-trunk/build.xml:821: The following 
error occurred while executing this line:
/data/users/rvadali/apache/hadoop-mapred-trunk/build.xml:805: The following 
error occurred while executing this line:
/data/users/rvadali/apache/hadoop-mapred-trunk/src/contrib/build.xml:60: The 
following error occurred while executing this line:
/data/users/rvadali/apache/hadoop-mapred-trunk/src/contrib/raid/build.xml:60: 
Tests failed!

{code}

 TestBlockFixer hangs initializing MiniMRCluster
 ---

 Key: MAPREDUCE-2283
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MAPREDUCE-2283.patch


 TestBlockFixer (a raid contrib test) is hanging the precommit testing on 
 Hudson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2285) MiniMRCluster does not start after ant test-patch

2011-01-27 Thread Ramkumar Vadali (JIRA)
MiniMRCluster does not start after ant test-patch
-

 Key: MAPREDUCE-2285
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2285
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Ramkumar Vadali


Any test using MiniMRCluster hangs in the MiniMRCluster constructor after 
running ant test-patch. Steps to reproduce:
 1. ant -Dpatch.file=dummy patch to CHANGES.txt  -Dforrest.home=path to 
forrest -Dfindbugs.home=path to findbugs -Dscratch.dir=/tmp/testpatch  
-Djava5.home=path to java5 test-patch
 2. Run any test that creates MiniMRCluster, say ant test 
-Dtestcase=TestFileArgs (contrib/streaming)

Expected result: Test should succeed
Actual result: Test hangs  in MiniMRCluster.init. This does not happen if we 
run ant clean after ant test-patch

Test output:
{code}
[junit] 11/01/27 12:11:43 INFO ipc.Server: IPC Server handler 3 on 58675: 
starting
[junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: TaskTracker up at: 
localhost.localdomain/127.0.0.1:58675
[junit] 11/01/27 12:11:43 INFO mapred.TaskTracker: Starting tracker 
tracker_host0.foo.com:localhost.localdomain/127.0.0.1:58675
[junit] 11/01/27 12:11:44 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 0 time(s).
[junit] 11/01/27 12:11:45 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 1 time(s).
[junit] 11/01/27 12:11:46 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 2 time(s).
[junit] 11/01/27 12:11:47 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 3 time(s).
[junit] 11/01/27 12:11:48 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 4 time(s).
[junit] 11/01/27 12:11:49 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 5 time(s).
[junit] 11/01/27 12:11:50 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 6 time(s).
[junit] 11/01/27 12:11:51 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 7 time(s).
[junit] 11/01/27 12:11:52 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 8 time(s).
[junit] 11/01/27 12:11:53 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:0. Already tried 9 time(s).
[junit] 11/01/27 12:11:53 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not 
available yet, Z...
{code}

Stack trace: 

{code}
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:611)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:429)
- locked 0x7f3b8dc08700 (a 
org.apache.hadoop.ipc.Client$Connection)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:504)
- locked 0x7f3b8dc08700 (a 
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:206)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164)
at org.apache.hadoop.ipc.Client.call(Client.java:1008)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at org.apache.hadoop.mapred.$Proxy11.getProtocolVersion(Unknown Source)
at 
org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:235)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:275)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:206)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:185)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:169)
at org.apache.hadoop.mapred.TaskTracker$2.run(TaskTracker.java:699)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1142)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:695)
- locked 0x7f3b8ccc3870 (a org.apache.hadoop.mapred.TaskTracker)
at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1391)
at 
org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.createTaskTracker(MiniMRCluster.java:219)
at 
org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner$1.run(MiniMRCluster.java:203)
at 
org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner$1.run(MiniMRCluster.java:201)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1142)
at 

[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster

2011-01-27 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987733#action_12987733
 ] 

Ramkumar Vadali commented on MAPREDUCE-2283:


Jira for MiniMRCluster problem: MAPREDUCE-2285

 TestBlockFixer hangs initializing MiniMRCluster
 ---

 Key: MAPREDUCE-2283
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0

 Attachments: MAPREDUCE-2283.patch


 TestBlockFixer (a raid contrib test) is hanging the precommit testing on 
 Hudson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster

2011-01-26 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987303#action_12987303
 ] 

Ramkumar Vadali commented on MAPREDUCE-2283:


I think this has something to do with the MR ports change. Is it possible that 
hudson does not do ant clean? This test has not changed recently, but 
MiniMRCluster has

 TestBlockFixer hangs initializing MiniMRCluster
 ---

 Key: MAPREDUCE-2283
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0


 TestBlockFixer (a raid contrib test) is hanging the precommit testing on 
 Hudson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster

2011-01-26 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987399#action_12987399
 ] 

Ramkumar Vadali commented on MAPREDUCE-2283:


Please turn it off while I figure out whats happening

 TestBlockFixer hangs initializing MiniMRCluster
 ---

 Key: MAPREDUCE-2283
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0


 TestBlockFixer (a raid contrib test) is hanging the precommit testing on 
 Hudson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-24 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Attachment: MAPREDUCE-2250.2.patch

Pulling in a test fix from another jira.

 Fix logging in raid code.
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.2.patch, 
 MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-24 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986049#action_12986049
 ] 

Ramkumar Vadali commented on MAPREDUCE-2250:


TEST RESULTS:

{code}
 [exec]
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec]
 [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec]
 [exec]
{code}

{code}
test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 498.696 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 153.311 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 969.737 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 12.785 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 17.575 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.295 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.038 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 14.459 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 65.327 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 512.67 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 254.251 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 41.865 sec
[junit] Running org.apache.hadoop.raid.TestRaidShellFsck
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 257.72 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.654 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.779 sec

test:

BUILD SUCCESSFUL
{code}

 Fix logging in raid code.
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.2.patch, 
 MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2279) Improper byte - int conversion in DistributedRaidFileSystem

2011-01-21 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2279:
---

Attachment: MAPREDUCE-2279.1.patch

Minor optimization in read()

 Improper byte - int conversion in DistributedRaidFileSystem
 

 Key: MAPREDUCE-2279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2279
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2279.1.patch, MAPREDUCE-2279.patch


 When return a byte value from DistributedRaidFileSystem.read(), we should do 
 0xff  byteVal. Otherwise the returned int value will be incorrectly negative.
 This is a regression from MAPREDUCE-2248

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2011-01-20 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2267:
---

Attachment: MAPREDUCE-2267.3.patch

Fixed RaidShell to not invoke the recoverFile RPC but use 
DistributedRaidFileSytsem to read a corrupt file 

 Parallelize reading of blocks within a stripe
 -

 Key: MAPREDUCE-2267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, 
 MAPREDUCE-2267.3.patch, MAPREDUCE-2267.patch


 RAID code has several instances where several blocks of data have to be read 
 to perform an operation. For example, computing a parity block requires 
 reading the blocks of the source file. Similarly, generating a fixed block 
 requires reading a parity block and the good blocks from the source file. 
 These read operations proceed sequentially currently. RAID code should use a 
 thread pool to increase the parallelism and thus reduce latency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2011-01-20 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2267:
---

Attachment: MAPREDUCE-2267.4.patch

Attached diff from top level

 Parallelize reading of blocks within a stripe
 -

 Key: MAPREDUCE-2267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, 
 MAPREDUCE-2267.3.patch, MAPREDUCE-2267.4.patch, MAPREDUCE-2267.patch


 RAID code has several instances where several blocks of data have to be read 
 to perform an operation. For example, computing a parity block requires 
 reading the blocks of the source file. Similarly, generating a fixed block 
 requires reading a parity block and the good blocks from the source file. 
 These read operations proceed sequentially currently. RAID code should use a 
 thread pool to increase the parallelism and thus reduce latency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2011-01-20 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984354#action_12984354
 ] 

Ramkumar Vadali commented on MAPREDUCE-2267:


TEST RESULTS

{code}

test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 341.649 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 239.963 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 880.943 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 12.681 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 18.833 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.293 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.037 sec
[junit] Running org.apache.hadoop.raid.TestParallelReader
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.141 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.981 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 70.121 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 547.15 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 137.672 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 22.473 sec
[junit] Running org.apache.hadoop.raid.TestRaidShellFsck
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 266.466 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.016 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.729 sec

test:

BUILD SUCCESSFUL
Total time: 43 minutes 5 seconds

{code}

{code}
 [exec] 
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 9 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

{code}

 Parallelize reading of blocks within a stripe
 -

 Key: MAPREDUCE-2267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, 
 MAPREDUCE-2267.3.patch, MAPREDUCE-2267.4.patch, MAPREDUCE-2267.patch


 RAID code has several instances where several blocks of data have to be read 
 to perform an operation. For example, computing a parity block requires 
 reading the blocks of the source file. Similarly, generating a fixed block 
 requires reading a parity block and the good blocks from the source file. 
 

[jira] Updated: (MAPREDUCE-2279) Improper byte - int conversion in DistributedRaidFileSystem

2011-01-20 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2279:
---

Attachment: MAPREDUCE-2279.patch

 Improper byte - int conversion in DistributedRaidFileSystem
 

 Key: MAPREDUCE-2279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2279
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2279.patch


 When return a byte value from DistributedRaidFileSystem.read(), we should do 
 0xff  byteVal. Otherwise the returned int value will be incorrectly negative.
 This is a regression from MAPREDUCE-2248

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2274) Generalize block fixer scheduler options

2011-01-19 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2274:
---

Description: The Raid block fixer currently allows the specification of the 
fair scheduler pool name. This is not generic since it assumes usage of the 
fair scheduler. Also this does not allow multiple options to be set, just the 
pool name. This is similar to MAPREDUCE-1818  (was: The Raid block fixer 
currently allows the specification of the fair scheduler pool name. This is not 
generic since it assumes usage of the fair scheduler. Also this does not allow 
multiple options to be set, just the pool name.)

 Generalize block fixer scheduler options
 

 Key: MAPREDUCE-2274
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2274
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 The Raid block fixer currently allows the specification of the fair scheduler 
 pool name. This is not generic since it assumes usage of the fair scheduler. 
 Also this does not allow multiple options to be set, just the pool name. This 
 is similar to MAPREDUCE-1818

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2274) Generalize block fixer scheduler options

2011-01-19 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2274:
---

Priority: Minor  (was: Major)

 Generalize block fixer scheduler options
 

 Key: MAPREDUCE-2274
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2274
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor

 The Raid block fixer currently allows the specification of the fair scheduler 
 pool name. This is not generic since it assumes usage of the fair scheduler. 
 Also this does not allow multiple options to be set, just the pool name. This 
 is similar to MAPREDUCE-1818

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2275) RaidNode should monitor and fix blocks that violate RAID block placement

2011-01-19 Thread Ramkumar Vadali (JIRA)
RaidNode should monitor and fix blocks that violate RAID block placement 
-

 Key: MAPREDUCE-2275
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2275
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


When files are RAIDed, it is important to keep blocks in each RAID stripe and 
the corresponding parity blocks on as many different machines as possible. This 
ensures minimal probability of data loss when data nodes go dead.

BlockPlacementPolicyRaid ensures that parity blocks are not located on the same 
machines as the source blocks. But source blocks placement is not controlled 
directly in this manner. Instead, source blocks are allowed to be created using 
the default policy. After a source file is RAIDed, its replication is 
increased, and then decreased. BlockPlacementPolicyRaid then tries to keep the 
source blocks well-located when excess blocks are deleted. This is not 
guaranteed to ensure the correct block placement for RAID.

Also, if blocks are moved around by the balancer, the block placement could be 
violated.

We need periodic monitoring of block placement of RAIDed files and the 
corresponding parity blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2011-01-14 Thread Ramkumar Vadali (JIRA)
Parallelize reading of blocks within a stripe
-

 Key: MAPREDUCE-2267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2267.patch

RAID code has several instances where several blocks of data have to be read to 
perform an operation. For example, computing a parity block requires reading 
the blocks of the source file. Similarly, generating a fixed block requires 
reading a parity block and the good blocks from the source file. These read 
operations proceed sequentially currently. RAID code should use a thread pool 
to increase the parallelism and thus reduce latency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2011-01-14 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2267:
---

Attachment: MAPREDUCE-2267.patch

 Parallelize reading of blocks within a stripe
 -

 Key: MAPREDUCE-2267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2267.patch


 RAID code has several instances where several blocks of data have to be read 
 to perform an operation. For example, computing a parity block requires 
 reading the blocks of the source file. Similarly, generating a fixed block 
 requires reading a parity block and the good blocks from the source file. 
 These read operations proceed sequentially currently. RAID code should use a 
 thread pool to increase the parallelism and thus reduce latency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2011-01-14 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2267:
---

Attachment: MAPREDUCE-2267.1.patch

Fixing failures found during tests.

 Parallelize reading of blocks within a stripe
 -

 Key: MAPREDUCE-2267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.patch


 RAID code has several instances where several blocks of data have to be read 
 to perform an operation. For example, computing a parity block requires 
 reading the blocks of the source file. Similarly, generating a fixed block 
 requires reading a parity block and the good blocks from the source file. 
 These read operations proceed sequentially currently. RAID code should use a 
 thread pool to increase the parallelism and thus reduce latency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2239) BlockPlacementPolicyRaid should call getBlockLocations only when necessary

2011-01-13 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981583#action_12981583
 ] 

Ramkumar Vadali commented on MAPREDUCE-2239:


Do you need to change FSNamesystem.LOG.debug - FSNamesystem.LOG.info?

 BlockPlacementPolicyRaid should call getBlockLocations only when necessary
 --

 Key: MAPREDUCE-2239
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2239
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2239.txt


 Currently BlockPlacementPolicyRaid calls getBlockLocations for every 
 chooseTarget().
 This puts pressure on NameNode. We should avoid calling if this file is not 
 raided or a parity file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-12 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980951#action_12980951
 ] 

Ramkumar Vadali commented on MAPREDUCE-2248:


TEST RESULTS
{code}
test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 524.787 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 154.653 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 944.872 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 13.241 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 17.78 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.293 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.036 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.007 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 178.351 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 646.931 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 253.727 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 21.994 sec
[junit] Running org.apache.hadoop.raid.TestRaidShellFsck
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 270.783 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.14 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.769 sec
{code}

{code}

 [exec] 
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 4 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
{code}


 DistributedRaidFileSystem should unraid only the corrupt block
 --

 Key: MAPREDUCE-2248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch


 DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
 It is better to unraid just the corrupt block and use the rest of the file as 
 normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Status: Open  (was: Patch Available)

 Fix logging in raid code.
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Hadoop Flags: [Reviewed]
  Status: Patch Available  (was: Open)

 Fix logging in raid code.
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Attachment: MAPREDUCE-2250.1.patch

Update after svn up

 Fix logging in raid code.
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2250) Fix log levels for error messages

2011-01-10 Thread Ramkumar Vadali (JIRA)
Fix log levels for error messages
-

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial


There are quite a few error messages being logged with a log level of info. 
That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix log levels for error messages

2011-01-10 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Attachment: MAPREDUCE-2250.patch

Fixes logging

 Fix log levels for error messages
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-10 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Summary: Fix logging in raid code.  (was: Fix log levels for error messages)

 Fix logging in raid code.
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-10 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Status: Patch Available  (was: Open)

Review at https://reviews.apache.org/r/266/

 Fix logging in raid code.
 -

 Key: MAPREDUCE-2250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Trivial
 Attachments: MAPREDUCE-2250.patch


 There are quite a few error messages being logged with a log level of info. 
 That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-10 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2248:
---

Attachment: MAPREDUCE-2248.1.patch

Addressed Scott's comments

 DistributedRaidFileSystem should unraid only the corrupt block
 --

 Key: MAPREDUCE-2248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch


 DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
 It is better to unraid just the corrupt block and use the rest of the file as 
 normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-06 Thread Ramkumar Vadali (JIRA)
DistributedRaidFileSystem should unraid only the corrupt block
--

 Key: MAPREDUCE-2248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
It is better to unraid just the corrupt block and use the rest of the file as 
normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-06 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2248:
---

Attachment: MAPREDUCE-2248.patch

review at https://reviews.apache.org/r/217/

 DistributedRaidFileSystem should unraid only the corrupt block
 --

 Key: MAPREDUCE-2248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2248.patch


 DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
 It is better to unraid just the corrupt block and use the rest of the file as 
 normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2245) Failure metrics for block fixer

2011-01-05 Thread Ramkumar Vadali (JIRA)
Failure metrics for block fixer
---

 Key: MAPREDUCE-2245
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2245
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor


Publish file fixing failure metrics for the block fixer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2246) Timeout for fixing a file

2011-01-05 Thread Ramkumar Vadali (JIRA)
Timeout for fixing a file
-

 Key: MAPREDUCE-2246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2246
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


If the DistBlockFixer takes a long time to to fix a file, it would be better to 
timeout and try again in a new MR job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2240) DistBlockFixer could sleep indefinitely

2011-01-04 Thread Ramkumar Vadali (JIRA)
DistBlockFixer could sleep indefinitely
---

 Key: MAPREDUCE-2240
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2240
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


DistributedBlockFixer computes its sleep interval based on the amount of time 
spent in fixing jobs. This computation has a bug which can result in the sleep 
interval becoming negative, which would make the distributed block fixer sleep 
indefinitely

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched

2010-12-11 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970534#action_12970534
 ] 

Ramkumar Vadali commented on MAPREDUCE-2214:


TEST RESULTS

ant test-patch complains about unit-tests, but its difficult to come up with a 
unit-test for this.
{code}
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec]
 [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec]
 [exec]
{code}

ant test: there was only one test failure, but that fails in a clean checkout 
too.
{code}
[junit] Test org.apache.hadoop.mapred.TestControlledMapReduceJob FAILED 
(timeout)
{code}

 TaskTracker should release slot if task is not launched
 ---

 Key: MAPREDUCE-2214
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2214.patch


 TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not 
 in an expected state. However, in the case where the task is not launched, 
 the slot is not released. We have observed this in production - the task was 
 in SUCCEEDED state by the time launchTask() got to it and then the slot was 
 never released. It is not clear how the task got into that state, but it is 
 better to handle the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched

2010-12-10 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2214:
---

Attachment: MAPREDUCE-2214.patch

 TaskTracker should release slot if task is not launched
 ---

 Key: MAPREDUCE-2214
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2214.patch


 TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not 
 in an expected state. However, in the case where the task is not launched, 
 the slot is not released. We have observed this in production - the task was 
 in SUCCEEDED state by the time launchTask() got to it and then the slot was 
 never released. It is not clear how the task got into that state, but it is 
 better to handle the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched

2010-12-10 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2214:
---

Status: Patch Available  (was: Open)

 TaskTracker should release slot if task is not launched
 ---

 Key: MAPREDUCE-2214
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2214.patch


 TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not 
 in an expected state. However, in the case where the task is not launched, 
 the slot is not released. We have observed this in production - the task was 
 in SUCCEEDED state by the time launchTask() got to it and then the slot was 
 never released. It is not clear how the task got into that state, but it is 
 better to handle the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1831) BlockPlacement policy for RAID

2010-12-09 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969887#action_12969887
 ] 

Ramkumar Vadali commented on MAPREDUCE-1831:


+1 looks good. Please run unit-tests and test-patch.

 BlockPlacement policy for RAID
 --

 Key: MAPREDUCE-1831
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.23.0

 Attachments: MAPREDUCE-1831-v2.txt, MAPREDUCE-1831.20100610.txt, 
 MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt


 Raid introduce the new dependency between blocks within a file.
 The blocks help decode each other. Therefore we should avoid put them on the 
 same machine.
 The proposed BlockPlacementPolicy does the following
 1. When writing parity blocks, it avoid the parity blocks and source blocks 
 sit together.
 2. When reducing replication number, it deletes the blocks that sits with 
 other dependent blocks.
 3. It does not change the way we write normal files. It only has different 
 behavior when processing raid files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2214) TaskTracker should release slot if task is not launched

2010-12-08 Thread Ramkumar Vadali (JIRA)
TaskTracker should release slot if task is not launched
---

 Key: MAPREDUCE-2214
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not in 
an expected state. However, in the case where the task is not launched, the 
slot is not released. We have observed this in production - the task was in 
SUCCEEDED state by the time launchTask() got to it and then the slot was never 
released. It is not clear how the task got into that state, but it is better to 
handle the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2156) Raid-aware FSCK

2010-12-01 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965744#action_12965744
 ] 

Ramkumar Vadali commented on MAPREDUCE-2156:


+1, looks good.

 Raid-aware FSCK
 ---

 Key: MAPREDUCE-2156
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2156
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Patrick Kling
Assignee: Patrick Kling
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2156.2.patch, MAPREDUCE-2156.patch


 Currently, FSCK reports files as corrupt even if they can be fixed using 
 parity blocks. We need a tool that only reports files that are irreparably 
 corrupt (i.e., files for which too many data or parity blocks belonging to 
 the same stripe have been lost or corrupted).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2155) RaidNode should optionally dispatch map reduce jobs to fix corrupt blocks (instead of fixing locally)

2010-11-29 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964904#action_12964904
 ] 

Ramkumar Vadali commented on MAPREDUCE-2155:


+1. Latest patch looks good to me.

 RaidNode should optionally dispatch map reduce jobs to fix corrupt blocks 
 (instead of fixing locally)
 -

 Key: MAPREDUCE-2155
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2155
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Patrick Kling
Assignee: Patrick Kling
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2155.2.patch, MAPREDUCE-2155.patch


 Recomputing blocks based on parity information is expensive. Rather than 
 doing this locally at the RaidNode, we should run map reduce jobs. This will 
 allow us to quickly fix a large number of corrupt or missing blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1367) LocalJobRunner should support parallel mapper execution

2010-11-29 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965054#action_12965054
 ] 

Ramkumar Vadali commented on MAPREDUCE-1367:


@Aaron, Just curious, is this being used in production? If so, could you please 
outline the use case?

 LocalJobRunner should support parallel mapper execution
 ---

 Key: MAPREDUCE-1367
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1367
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1367.2.patch, MAPREDUCE-1367.3.patch, 
 MAPREDUCE-1367.4.patch, MAPREDUCE-1367.5.patch, MAPREDUCE-1367.6.patch, 
 MAPREDUCE-1367.7.patch, MAPREDUCE-1367.patch


 The LocalJobRunner currently supports only a single execution thread. Given 
 the prevalence of multi-core CPUs, it makes sense to allow users to run 
 multiple tasks in parallel for improved performance on small (local-only) 
 jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run

2010-11-19 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1783:
---

Attachment: MAPREDUCE-1783.patch

Patch after svn up

 Task Initialization should be delayed till when a job can be run
 

 Key: MAPREDUCE-1783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.22.0

 Attachments: 0001-Pool-aware-job-initialization.patch, 
 0001-Pool-aware-job-initialization.patch.1, MAPREDUCE-1783.patch, 
 submit-mapreduce-1783.patch


 The FairScheduler task scheduler uses PoolManager to impose limits on the 
 number of jobs that can be running at a given time. However, jobs that are 
 submitted are initiaiized immediately by EagerTaskInitializationListener by 
 calling JobInProgress.initTasks. This causes the job split file to be read 
 into memory. The split information is not needed until the number of running 
 jobs is less than the maximum specified. If the amount of split information 
 is large, this leads to unnecessary memory pressure on the Job Tracker.
 To ease memory pressure, FairScheduler can use another implementation of 
 JobInProgressListener that is aware of PoolManager limits and can delay task 
 initialization until the number of running jobs is below the maximum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run

2010-11-19 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934065#action_12934065
 ] 

Ramkumar Vadali commented on MAPREDUCE-1783:


Latest patch TEST RESULTS:

One test fails, but that also fails on a clean checkout
{code}
[junit] Test org.apache.hadoop.mapred.TestControlledMapReduceJob FAILED 
(timeout)
{code}

ant test-patch succeeds:
{code}
 [exec] 
 [exec] 
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 13 minutes 6 seconds
Test results are in /tmp/rvadali.hadoopQA

{code}

 Task Initialization should be delayed till when a job can be run
 

 Key: MAPREDUCE-1783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.22.0

 Attachments: 0001-Pool-aware-job-initialization.patch, 
 0001-Pool-aware-job-initialization.patch.1, MAPREDUCE-1783.patch, 
 submit-mapreduce-1783.patch


 The FairScheduler task scheduler uses PoolManager to impose limits on the 
 number of jobs that can be running at a given time. However, jobs that are 
 submitted are initiaiized immediately by EagerTaskInitializationListener by 
 calling JobInProgress.initTasks. This causes the job split file to be read 
 into memory. The split information is not needed until the number of running 
 jobs is less than the maximum specified. If the amount of split information 
 is large, this leads to unnecessary memory pressure on the Job Tracker.
 To ease memory pressure, FairScheduler can use another implementation of 
 JobInProgressListener that is aware of PoolManager limits and can delay task 
 initialization until the number of running jobs is below the maximum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2159) Provide metrics for RaidNode

2010-11-18 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2159:
---

Attachment: MAPREDUCE-2159.patch

Adding the classes RaidNodeMetrics and TestRaidNodeMetrics

 Provide metrics for RaidNode
 

 Key: MAPREDUCE-2159
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2159
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2159.patch


 It will be useful to have the following metrics for RAID:
  - files raided
  - files too new to be raided
  - files too small to be raided
  - number of blocks fixed using raid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2159) Provide metrics for RaidNode

2010-11-18 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2159:
---

Status: Patch Available  (was: Open)

 Provide metrics for RaidNode
 

 Key: MAPREDUCE-2159
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2159
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2159.patch


 It will be useful to have the following metrics for RAID:
  - files raided
  - files too new to be raided
  - files too small to be raided
  - number of blocks fixed using raid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2189) RAID Parallel traversal needs to synchronize stats

2010-11-15 Thread Ramkumar Vadali (JIRA)
RAID Parallel traversal needs to synchronize stats
--

 Key: MAPREDUCE-2189
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2189
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


The implementation of multi-threaded directory traversal does not update stats 
in a thread-safe manner

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2189) RAID Parallel traversal needs to synchronize stats

2010-11-15 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2189:
---

Attachment: MAPREDUCE-2189.patch

 RAID Parallel traversal needs to synchronize stats
 --

 Key: MAPREDUCE-2189
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2189
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2189.patch


 The implementation of multi-threaded directory traversal does not update 
 stats in a thread-safe manner

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API

2010-11-12 Thread Ramkumar Vadali (JIRA)
Port DistRaid.java to new mapreduce API
---

 Key: MAPREDUCE-2184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


DistRaid.java was implemented with the older mapred API, this task is for 
porting it to the new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API

2010-11-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2184:
---

Component/s: contrib/raid

 Port DistRaid.java to new mapreduce API
 ---

 Key: MAPREDUCE-2184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 DistRaid.java was implemented with the older mapred API, this task is for 
 porting it to the new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()

2010-11-12 Thread Ramkumar Vadali (JIRA)
DistributedRaidFileSystem should implement getFileBlockLocations()
--

 Key: MAPREDUCE-2186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali


If a RAIDed file has missing blocks, 
DistributedRaidFileSystem.getFileBlockLocations() would return no block 
locations. This could lead a client to believe that the file is not readable. 
But if parity data is available, the file actually is readable.

It would be better to implement getFileBlockLocations() and return the location 
of the parity blocks that would be needed to reconstruct the missing block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API

2010-11-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2184:
---

Status: Patch Available  (was: Open)

 Port DistRaid.java to new mapreduce API
 ---

 Key: MAPREDUCE-2184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2184.patch


 DistRaid.java was implemented with the older mapred API, this task is for 
 porting it to the new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2184) Port DistRaid.java to new mapreduce API

2010-11-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2184:
---

Attachment: MAPREDUCE-2184.patch

Review at https://reviews.apache.org/r/87/

TEST RESULTS:


ant test:

{code}

test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 377.746 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 133.511 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 11.485 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.063 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.396 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.265 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 68.93 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 455.365 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 215.837 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.015 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 14.912 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.207 sec

test:

BUILD SUCCESSFUL
Total time: 22 minutes 41 seconds
{code}

ant test-patch:
The errors are the same a clean trunk checkout.
{code}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 13 new Findbugs 
warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 2 release audit 
warnings (more than the trunk's current 1 warnings).
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 


{code}


 Port DistRaid.java to new mapreduce API
 ---

 Key: MAPREDUCE-2184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2184
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2184.patch


 DistRaid.java was implemented with the older mapred API, this task is for 
 porting it to the new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2169) Integrated Reed-Solomon code with RaidNode

2010-11-09 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2169:
---

Attachment: MAPREDUCE-2169.2.patch

TEST RESULTS:

ant test under raid:
{code}
test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 373.594 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 138.885 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 15.061 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.491 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.39 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.809 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.229 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 461.174 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 218.163 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 24.31 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 14.96 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.368 sec

test:

BUILD SUCCESSFUL
Total time: 22 minutes 53 seconds

ant test-patch has the same result as a clean checkout (see MAPREDUCE-2176)
{code}

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 28 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] -1 findbugs.  The patch appears to introduce 13 new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 2 release audit 
warnings (more than the trunk's current 1 warnings).
 [exec]
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec]
 [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec]
 [exec]

{code}

 Integrated Reed-Solomon code with RaidNode
 --

 Key: MAPREDUCE-2169
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2169
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2169.2.patch, MAPREDUCE-2169.patch


 Scott Chen recently checked in an implementation of  the Reed Solomon code. 
 This task will track the integration of the code with RaidNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node

2010-11-09 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2167:
---

Attachment: MAPREDUCE-2167.4.patch

Fixed a broken test.

TEST RESULTS:


ant test-patch has the same number of failures as a clean checkout

{code}
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 4 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] -1 findbugs.  The patch appears to introduce 13 new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 2 release audit 
warnings (more than the trunk's current 1 warnings).
 [exec]
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec]
 [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec]
 [exec]
{code}

ant test succeeds:

{code}


test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 47.071 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 124.583 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.337 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.481 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.392 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.485 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 71.136 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 471.072 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.828 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec

test:

BUILD SUCCESSFUL
Total time: 15 minutes 6 seconds
{code}


 Faster directory traversal for raid node
 

 Key: MAPREDUCE-2167
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.3.patch, 
 MAPREDUCE-2167.4.patch, MAPREDUCE-2167.patch


 The RaidNode currently iterates over the directory structure to figure out 
 which files to RAID. With millions of files, this can take a long time - 
 especially if some files are already RAIDed and the RaidNode needs to look at 
 parity files / parity file HARs to determine if the file needs to be RAIDed.
 The directory traversal is encapsulated inside the class DirectoryTraversal, 
 which examines one file at a time, using the caller's thread.
 My proposal is to make this multi-threaded as follows:
  * use a pool of threads inside DirectoryTraversal
  * The caller's thread is used to retrieve directories, and each new 
 directory is assigned to a thread in the pool. The worker thread examines all 
 the files the directory.
  * If there sub-directories, those are added back as workitems to the pool.
 Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node

2010-11-08 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2167:
---

Attachment: MAPREDUCE-2167.3.patch

Added a comment explaining the use of the slots semaphore.

 Faster directory traversal for raid node
 

 Key: MAPREDUCE-2167
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.3.patch, 
 MAPREDUCE-2167.patch


 The RaidNode currently iterates over the directory structure to figure out 
 which files to RAID. With millions of files, this can take a long time - 
 especially if some files are already RAIDed and the RaidNode needs to look at 
 parity files / parity file HARs to determine if the file needs to be RAIDed.
 The directory traversal is encapsulated inside the class DirectoryTraversal, 
 which examines one file at a time, using the caller's thread.
 My proposal is to make this multi-threaded as follows:
  * use a pool of threads inside DirectoryTraversal
  * The caller's thread is used to retrieve directories, and each new 
 directory is assigned to a thread in the pool. The worker thread examines all 
 the files the directory.
  * If there sub-directories, those are added back as workitems to the pool.
 Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-2179) RaidBlockSender.java compilation fails

2010-11-08 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali reassigned MAPREDUCE-2179:
--

Assignee: Ramkumar Vadali

 RaidBlockSender.java compilation fails
 --

 Key: MAPREDUCE-2179
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Giridharan Kesavan
Assignee: Ramkumar Vadali
Priority: Blocker

 https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull
 Mapreduce trunk compilation is broken with 
 compile:
  [echo] contrib: raid
 [javac] Compiling 27 source files to 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]   private BlockTransferThrottler throttler;
 [javac]   ^
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]  BlockTransferThrottler throttler) throws 
 IOException {
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2179) RaidBlockSender.java compilation fails

2010-11-08 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2179:
---

Status: Patch Available  (was: Open)

 RaidBlockSender.java compilation fails
 --

 Key: MAPREDUCE-2179
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Giridharan Kesavan
Assignee: Ramkumar Vadali
Priority: Blocker
 Attachments: MAPREDUCE-2179.patch


 https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull
 Mapreduce trunk compilation is broken with 
 compile:
  [echo] contrib: raid
 [javac] Compiling 27 source files to 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]   private BlockTransferThrottler throttler;
 [javac]   ^
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]  BlockTransferThrottler throttler) throws 
 IOException {
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2179) RaidBlockSender.java compilation fails

2010-11-08 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2179:
---

Attachment: MAPREDUCE-2179.patch

r1032836 (HDFS-1457) removed the class BlockTransferThrottler. The RAID code 
does not need that functionality, so this patch just removes the dependence on 
BlockTransferThrottler.

 RaidBlockSender.java compilation fails
 --

 Key: MAPREDUCE-2179
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Giridharan Kesavan
Assignee: Ramkumar Vadali
Priority: Blocker
 Attachments: MAPREDUCE-2179.patch


 https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull
 Mapreduce trunk compilation is broken with 
 compile:
  [echo] contrib: raid
 [javac] Compiling 27 source files to 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]   private BlockTransferThrottler throttler;
 [javac]   ^
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]  BlockTransferThrottler throttler) throws 
 IOException {
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2179) RaidBlockSender.java compilation fails

2010-11-08 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929995#action_12929995
 ] 

Ramkumar Vadali commented on MAPREDUCE-2179:


ant test-patch will not run since trunk compilation is broken.

I have run raid unit-tests:

{code}

test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 41.64 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 139.487 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.169 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 26.334 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.399 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.051 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.473 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 70.16 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 406.19 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.977 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 26.168 sec

test:

BUILD SUCCESSFUL
Total time: 14 minutes 12 seconds

{code}

 RaidBlockSender.java compilation fails
 --

 Key: MAPREDUCE-2179
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2179
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Giridharan Kesavan
Assignee: Ramkumar Vadali
Priority: Blocker
 Attachments: MAPREDUCE-2179.patch


 https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/490/consoleFull
 Mapreduce trunk compilation is broken with 
 compile:
  [echo] contrib: raid
 [javac] Compiling 27 source files to 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/build/contrib/raid/classes
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:71:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]   private BlockTransferThrottler throttler;
 [javac]   ^
 [javac] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk/trunk/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:377:
  cannot find symbol
 [javac] symbol  : class BlockTransferThrottler
 [javac] location: class 
 org.apache.hadoop.hdfs.server.datanode.RaidBlockSender
 [javac]  BlockTransferThrottler throttler) throws 
 IOException {
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1704) Parity files that are outdated or nonexistent should be immediately disregarded

2010-11-06 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929239#action_12929239
 ] 

Ramkumar Vadali commented on MAPREDUCE-1704:


This is not an issue anymore

 Parity files that are outdated or nonexistent should be immediately 
 disregarded
 ---

 Key: MAPREDUCE-1704
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1704
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Rodrigo Schmidt
Assignee: Scott Chen
 Fix For: 0.22.0


 In the current implementation, old or nonexistent parity files are not 
 immediately disregarded. Absence will trigger exceptions, but old files could 
 lead to bad recoveries and maybe data corruption. This should be fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1706) Log RAID recoveries on HDFS

2010-11-06 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929242#action_12929242
 ] 

Ramkumar Vadali commented on MAPREDUCE-1706:


This looks good to me.
+1

 Log RAID recoveries on HDFS
 ---

 Key: MAPREDUCE-1706
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1706
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Rodrigo Schmidt
Assignee: Scott Chen
 Attachments: MAPREDUCE-1706.txt


 It would be good to have a way to centralize all the recovery logs, since 
 recovery can be executed by any hdfs client. The best place to store this 
 information is HDFS itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-2176) ant test-patch failing on a clean checkout

2010-11-05 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali resolved MAPREDUCE-2176.


Resolution: Duplicate

Dup of MAPREDUCE-2172

 ant test-patch failing on a clean checkout
 --

 Key: MAPREDUCE-2176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2176
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ramkumar Vadali

 ant test-patch fails for a dummy patch on CHANGES.txt:
 {code}
  [exec] 
  [exec] -1 overall.  
  [exec] 
  [exec] +1 @author.  The patch does not contain any @author tags.
  [exec] 
  [exec] -1 tests included.  The patch doesn't appear to include any 
 new or modified tests.
  [exec] Please justify why no new tests are 
 needed for this patch.
  [exec] Also please list what manual steps were 
 performed to verify this patch.
  [exec] 
  [exec] +1 javadoc.  The javadoc tool did not generate any warning 
 messages.
  [exec] 
  [exec] +1 javac.  The applied patch does not increase the total 
 number of javac compiler warnings.
  [exec] 
  [exec] -1 findbugs.  The patch appears to introduce 13 new Findbugs 
 warnings.
  [exec] 
  [exec] -1 release audit.  The applied patch generated 3 release 
 audit warnings (more than the trunk's current 1 warnings).
  [exec] 
  [exec] +1 system test framework.  The patch passed system test 
 framework compile.
  [exec] 
  [exec] 
  [exec] 
  [exec] 
  [exec] 
 ==
  [exec] 
 ==
  [exec] Finished build.
  [exec] 
 ==
  [exec] 
 ==
  [exec] 
  [exec] 
 BUILD FAILED
 /data/users/rvadali/apache/hadoop-mapred-trunk/build.xml:1740: exec returned: 
 3
 Total time: 13 minutes 14 seconds
 Test results are in /tmp/rvadali.hadoopQA
 [rvad...@dev502 hadoop-mapred-trunk]$ svn st 
 ?  build-fi
 ?  SecurityAuth.audit
 ?  lib/jdiff/hadoop-mapred_0.22.0-SNAPSHOT.xml
 M  CHANGES.txt
 X  src/test/bin
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2172) test-patch.properties contains incorrect/version-dependent values of OK_FINDBUGS_WARNINGS and OK_RELEASEAUDIT_WARNINGS

2010-11-05 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928728#action_12928728
 ] 

Ramkumar Vadali commented on MAPREDUCE-2172:


@Nigel, both Patrick and I see ant test-patch fail on a clean checkout. We 
think this is related to HADOOP-7008. Is there some configuration we should 
change before running ant test-patch?

 test-patch.properties contains incorrect/version-dependent values of 
 OK_FINDBUGS_WARNINGS and OK_RELEASEAUDIT_WARNINGS
 --

 Key: MAPREDUCE-2172
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2172
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: FindBugs 1.3.4
Reporter: Patrick Kling

 Running ant test-patch with an empty patch yields 25 findbugs warning and 3 
 release audit warnings (rather than the 0 findbugs warnings and 1 release 
 audit warning specified in test-patch.properties):
 {code}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new or 
 modified tests.
 [exec] Please justify why no new tests are needed for 
 this patch.
 [exec] Also please list what manual steps were 
 performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
 messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number of 
 javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 25 new Findbugs 
 warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 3 release audit 
 warnings (more than the trunk's current 1 warnings).
 [exec] 
 [exec] +1 system test framework.  The patch passed system test framework 
 compile.
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node

2010-11-05 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2167:
---

Attachment: MAPREDUCE-2167.2.patch

Using a semaphore now to track the active threads. The logic is much simpler 
now.

 Faster directory traversal for raid node
 

 Key: MAPREDUCE-2167
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.patch


 The RaidNode currently iterates over the directory structure to figure out 
 which files to RAID. With millions of files, this can take a long time - 
 especially if some files are already RAIDed and the RaidNode needs to look at 
 parity files / parity file HARs to determine if the file needs to be RAIDed.
 The directory traversal is encapsulated inside the class DirectoryTraversal, 
 which examines one file at a time, using the caller's thread.
 My proposal is to make this multi-threaded as follows:
  * use a pool of threads inside DirectoryTraversal
  * The caller's thread is used to retrieve directories, and each new 
 directory is assigned to a thread in the pool. The worker thread examines all 
 the files the directory.
  * If there sub-directories, those are added back as workitems to the pool.
 Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   >