[jira] Updated: (HDFS-1461) Refactor hdfs.server.datanode.BlockSender

2010-11-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1461:
--

Attachment: HDFS-1461.patch

This patch introduces a RAID-friendly constructor for {{BlockSender}}. This 
constructor does not need a {{DataNode}} object, and works with streams 
instead. The common code between the new and old constructors is refactored to 
a function {{initialize()}}

I ran the HDFS unit tests and I did not see any new failure from trunk.

 Refactor hdfs.server.datanode.BlockSender
 -

 Key: HDFS-1461
 URL: https://issues.apache.org/jira/browse/HDFS-1461
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: HDFS-1461.patch


 BlockSender provides the functionality to send a block to a data node. But 
 the current implementation requires the source of the block to be a data 
 node. The RAID contrib project needs the functionality of sending a block to 
 a data node, but cannot use hdfs.server.datanode.BlockSender because the 
 constructor requires a datanode object.
 MAPREDUCE-2132 provides the motivation for this.
 The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to 
 have another constructor that does not need a DataNode object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1461) Refactor hdfs.server.datanode.BlockSender

2010-11-22 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1461:
--

Description: 
BlockSender provides the functionality to send a block to a data node. But the 
current implementation requires the source of the block to be a data node. The 
RAID contrib project needs the functionality of sending a block to a data node, 
but cannot use hdfs.server.datanode.BlockSender because the constructor 
requires a datanode object.

MAPREDUCE-2132 provides the motivation for this.

The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to 
have another constructor that does not need a DataNode object.

  was:
BlockSender provides the functionality to send a block to a data node. But the 
current implementation requires the source of the block to be a data node. The 
RAID contrib project needs the functionality of sending a block to a data node, 
but cannot use hdfs.server.datanode.BlockSender because the constructor 
requires a datanode object.

https://issues.apache.org/jira/browse/MAPREDUCE-2132 provides the motivation 
for this.

The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to 
have another constructor that does not need a DataNode object.


 Refactor hdfs.server.datanode.BlockSender
 -

 Key: HDFS-1461
 URL: https://issues.apache.org/jira/browse/HDFS-1461
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 BlockSender provides the functionality to send a block to a data node. But 
 the current implementation requires the source of the block to be a data 
 node. The RAID contrib project needs the functionality of sending a block to 
 a data node, but cannot use hdfs.server.datanode.BlockSender because the 
 constructor requires a datanode object.
 MAPREDUCE-2132 provides the motivation for this.
 The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to 
 have another constructor that does not need a DataNode object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124

2010-11-22 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934616#action_12934616
 ] 

Ramkumar Vadali commented on HDFS-1257:
---

Hi Konstantin, sorry for the delay in getting back to this.

It seems difficult to come up with a general solution to this since some 
methods in {{BlockManager}} do fine grained locking with 
{{namesystem.readLock()/writeLock()}}. In particular, the call to 
{{BlockManager.computeReplicationWork}} that you referred to seems safe because 
of locking inside {{BlockManager.computeReplicationWorkForBlock()}}.

BlockManager has several calls to {{namesystem.readLock()}} and 
{{namesystem.writeLock()}} apart from the one I mention. Are you suggesting a 
restructuring of those calls?

 Race condition introduced by HADOOP-5124
 

 Key: HDFS-1257
 URL: https://issues.apache.org/jira/browse/HDFS-1257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Ramkumar Vadali
 Attachments: HDFS-1257.patch


 HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. 
 But it introduced unprotected access to the data structure 
 recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork 
 accesses recentInvalidateSets without read-lock protection. If there is 
 concurrent activity (like reducing replication on a file) that adds to 
 recentInvalidateSets, the name-node crashes with a 
 ConcurrentModificationException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1457) Limit transmission rate when transfering image between primary and secondary NNs

2010-11-05 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928736#action_12928736
 ] 

Ramkumar Vadali commented on HDFS-1457:
---

@Hairong, I see a lot of release audit warnings in a clean MR checkout too. I 
think this is due to HADOOP-7008. Please see MAPREDUCE-2172 for this.

 Limit transmission rate when transfering image between primary and secondary 
 NNs
 

 Key: HDFS-1457
 URL: https://issues.apache.org/jira/browse/HDFS-1457
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: checkpoint-limitandcompress.patch, 
 trunkThrottleImage.patch, trunkThrottleImage1.patch


 If the fsimage is very big. The network is full in a short time when 
 SeconaryNamenode do checkpoint, leading to Jobtracker access Namenode to get 
 relevant file data to fail in job initialization phase. So we limit 
 transmission speed and compress transmission to resolve the problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1171) RaidNode should fix missing blocks directly on Data Node

2010-10-25 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali resolved HDFS-1171.
---

Resolution: Invalid

Recreating this in Map/Reduce: MAPREDUCE-2150

 RaidNode should fix missing blocks directly on Data Node
 

 Key: HDFS-1171
 URL: https://issues.apache.org/jira/browse/HDFS-1171
 Project: Hadoop HDFS
  Issue Type: Task
  Components: contrib/raid
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 RaidNode currently does not fix missing blocks. The missing blocks have to be 
 fixed manually.
 This task proposes that recovery be more automated:
 1. RaidNode periodically fetches a list of corrupt files from the NameNode
 2. If the corrupt files has a RAID parity file, RaidNode identifies missing 
 block(s) in the file and recomputes the block(s) using the parity file and 
 other good blocks
 3. RaidNode sends the generated block contents to a DataNode
a. RaidNode chooses a DataNode with the most available space to send the 
 block. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1472) Refactor DFSck to allow programmatic access to output

2010-10-22 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924067#action_12924067
 ] 

Ramkumar Vadali commented on HDFS-1472:
---

Test results:

ant test-patch:


 [exec]
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to i
 [exec] nclude 2 new or modified tests.
 [exec] [exec] +1 javadoc.  The javadoc tool did not generate any 
warning messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec] +1 system tests framework.  The patch passed system tests 
framework compile.
 [exec] [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==

ant test:
Some tests failed, but I verified that these fail in a clean checkout as well.
[junit] Test org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery FAILED
[junit] Test org.apache.hadoop.hdfs.TestFileStatus FAILED
[junit] Test org.apache.hadoop.hdfs.TestHDFSTrash FAILED (timeout)
[junit] Test org.apache.hadoop.fs.TestHDFSFileContextMainOperations FAILED
[junit] Test org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery FAILED


 Refactor DFSck to allow programmatic access to output
 -

 Key: HDFS-1472
 URL: https://issues.apache.org/jira/browse/HDFS-1472
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Ramkumar Vadali
 Attachments: HDFS-1472.patch


 DFSck prints the list of corrupt files to stdout. This jira proposes that it 
 write to a PrintStream object that is passed to the constructor. This will 
 allow components like RAID to programmatically get a list of corrupt files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1472) Refactor DFSck to allow programmatic access to output

2010-10-20 Thread Ramkumar Vadali (JIRA)
Refactor DFSck to allow programmatic access to output
-

 Key: HDFS-1472
 URL: https://issues.apache.org/jira/browse/HDFS-1472
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Ramkumar Vadali


DFSck prints the list of corrupt files to stdout. This jira proposes that it 
write to a PrintStream object that is passed to the constructor. This will 
allow components like RAID to programmatically get a list of corrupt files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1472) Refactor DFSck to allow programmatic access to output

2010-10-20 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1472:
--

Attachment: HDFS-1472.patch

Adds a constructor DFSck(Configuration, PrintStream). This is better than 
modifying System.out.

 Refactor DFSck to allow programmatic access to output
 -

 Key: HDFS-1472
 URL: https://issues.apache.org/jira/browse/HDFS-1472
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Ramkumar Vadali
 Attachments: HDFS-1472.patch


 DFSck prints the list of corrupt files to stdout. This jira proposes that it 
 write to a PrintStream object that is passed to the constructor. This will 
 allow components like RAID to programmatically get a list of corrupt files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1472) Refactor DFSck to allow programmatic access to output

2010-10-20 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1472:
--

Status: Patch Available  (was: Open)

 Refactor DFSck to allow programmatic access to output
 -

 Key: HDFS-1472
 URL: https://issues.apache.org/jira/browse/HDFS-1472
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Ramkumar Vadali
 Attachments: HDFS-1472.patch


 DFSck prints the list of corrupt files to stdout. This jira proposes that it 
 write to a PrintStream object that is passed to the constructor. This will 
 allow components like RAID to programmatically get a list of corrupt files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1461) Refactor hdfs.server.datanode.BlockSender

2010-10-18 Thread Ramkumar Vadali (JIRA)
Refactor hdfs.server.datanode.BlockSender
-

 Key: HDFS-1461
 URL: https://issues.apache.org/jira/browse/HDFS-1461
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Ramkumar Vadali


BlockSender provides the functionality to send a block to a data node. But the 
current implementation requires the source of the block to be a data node. The 
RAID contrib project needs the functionality of sending a block to a data node, 
but cannot use hdfs.server.datanode.BlockSender because the constructor 
requires a datanode object.

https://issues.apache.org/jira/browse/MAPREDUCE-2132 provides the motivation 
for this.

The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to 
have another constructor that does not need a DataNode object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1453) Need a command line option in RaidShell to fix blocks using raid

2010-10-13 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali resolved HDFS-1453.
---

Resolution: Invalid

RAID is a MR project, will reopen this under MR.

 Need a command line option in RaidShell to fix blocks using raid
 

 Key: HDFS-1453
 URL: https://issues.apache.org/jira/browse/HDFS-1453
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali

 RaidShell currently has an option to recover a file and return the path to 
 the recovered file. The administrator can then rename the recovered file to 
 the damaged file.
 The problem with this is that the file metadata is altered, specifically the 
 modification time. Instead we need a way to just repair the damaged blocks 
 and send the fixed blocks to a data node.
 Once this is done, we can put automation around it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1453) Need a command line option in RaidShell to fix blocks using raid

2010-10-12 Thread Ramkumar Vadali (JIRA)
Need a command line option in RaidShell to fix blocks using raid


 Key: HDFS-1453
 URL: https://issues.apache.org/jira/browse/HDFS-1453
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali


RaidShell currently has an option to recover a file and return the path to the 
recovered file. The administrator can then rename the recovered file to the 
damaged file.

The problem with this is that the file metadata is altered, specifically the 
modification time. Instead we need a way to just repair the damaged blocks and 
send the fixed blocks to a data node.

Once this is done, we can put automation around it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-10-04 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917481#action_12917481
 ] 

Ramkumar Vadali commented on HDFS-503:
--

@shravankumar, to get a basic idea of HDFS RAID, you can read up Dhruba's blog 
post 
http://hadoopblog.blogspot.com/2009/08/hdfs-and-erasure-codes-hdfs-raid.html

If you need this for demo purposes, could you use the current hadoop trunk? I 
am not sure about the exact date of the next release. 
To use RAID, you need to create a configuration file and start the RAID daemon. 
You can look for examples in the unit tests, say TestRaidNode.


For further communication, you can contact me directly.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-09-30 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916677#action_12916677
 ] 

Ramkumar Vadali commented on HDFS-503:
--

@shravankumar Quite a few bugs in raid have been fixed in trunk. This will be 
part of the upcoming release hadoop-0.22. What do you mean by raid API?

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete

2010-08-30 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904431#action_12904431
 ] 

Ramkumar Vadali commented on HDFS-:
---

The RaidNode use case at a high level is to identify corrupted data that can be 
fixed by using parity data.

This can be achieved by:
 1. Getting a list of corrupt files and subsequently identifying the corrupt 
blocks in each corrupt file. The current getCorruptFiles() RPC enables getting 
the list of corrupt files. 
-OR-
 2. Getting a list of corrupt files annotated by the corrupt blocks. If this 
patch introduced a RPC with that functionality, it would be an improvement over 
the getCorruptFiles() RPC. 

I have a patch for https://issues.apache.org/jira/browse/HDFS-1171 that depends 
on the getCorruptFiles() RPC, so removal of that RPC with no substitute would 
mean loss of functionality.

 getCorruptFiles() should give some hint that the list is not complete
 -

 Key: HDFS-
 URL: https://issues.apache.org/jira/browse/HDFS-
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.22.0
Reporter: Rodrigo Schmidt
Assignee: Sriram Rao
 Fix For: 0.22.0

 Attachments: HADFS-.0.patch, HDFS--y20.1.patch, 
 HDFS--y20.2.patch, HDFS-.trunk.patch


 If the list of corruptfiles returned by the namenode doesn't say anything if 
 the number of corrupted files is larger than the call output limit (which 
 means the list is not complete). There should be a way to hint incompleteness 
 to clients.
 A simple hack would be to add an extra entry to the array returned with the 
 value null. Clients could interpret this as a sign that there are other 
 corrupt files in the system.
 We should also do some rephrasing of the fsck output to make it more 
 confident when the list is not complete and less confident when the list is 
 known to be incomplete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1257) Race condition introduced by HADOOP-5124

2010-07-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1257:
--

Attachment: HDFS-1257.patch

Use protected access as suggested by Hairong.

 Race condition introduced by HADOOP-5124
 

 Key: HDFS-1257
 URL: https://issues.apache.org/jira/browse/HDFS-1257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Ramkumar Vadali
 Attachments: HDFS-1257.patch


 HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. 
 But it introduced unprotected access to the data structure 
 recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork 
 accesses recentInvalidateSets without read-lock protection. If there is 
 concurrent activity (like reducing replication on a file) that adds to 
 recentInvalidateSets, the name-node crashes with a 
 ConcurrentModificationException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124

2010-06-24 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882228#action_12882228
 ] 

Ramkumar Vadali commented on HDFS-1257:
---

I will try to reproduce this with a unit-test, and will update with the results.

 Race condition introduced by HADOOP-5124
 

 Key: HDFS-1257
 URL: https://issues.apache.org/jira/browse/HDFS-1257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Ramkumar Vadali

 HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. 
 But it introduced unprotected access to the data structure 
 recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork 
 accesses recentInvalidateSets without read-lock protection. If there is 
 concurrent activity (like reducing replication on a file) that adds to 
 recentInvalidateSets, the name-node crashes with a 
 ConcurrentModificationException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124

2010-06-23 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881884#action_12881884
 ] 

Ramkumar Vadali commented on HDFS-1257:
---

I am quite sure it is not a case of modifying while iterating over the 
collection. ConcurrentModificationException is typically seen when a collection 
is modified while iterating on it,but that is not the case here. This is a case 
of two threads actually performing read/write without protection. The 
collection is not guaranteed to catch the multi-threaded case, but I have seen 
it happen.


 Race condition introduced by HADOOP-5124
 

 Key: HDFS-1257
 URL: https://issues.apache.org/jira/browse/HDFS-1257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Ramkumar Vadali

 HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. 
 But it introduced unprotected access to the data structure 
 recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork 
 accesses recentInvalidateSets without read-lock protection. If there is 
 concurrent activity (like reducing replication on a file) that adds to 
 recentInvalidateSets, the name-node crashes with a 
 ConcurrentModificationException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124

2010-06-23 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881893#action_12881893
 ] 

Ramkumar Vadali commented on HDFS-1257:
---

My proposal is to wrap FSNamesystem#recentInvalidateSets in 
Collections.synchronizedMap(). That should fix this problem.


--- a/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
+++ b/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
@@ -189,7 +189,7 @@ public class FSNamesystem implements FSConstants, 
FSNamesystemMBean, FSClusterSt
   // Mapping: StorageID - ArrayListBlock
   //
   private MapString, CollectionBlock recentInvalidateSets = 
-new TreeMapString, CollectionBlock();
+Collections.synchronizedMap(new TreeMapString, CollectionBlock());
 
   //
   // Keeps a TreeSet for every named node.  Each treeset contains


 Race condition introduced by HADOOP-5124
 

 Key: HDFS-1257
 URL: https://issues.apache.org/jira/browse/HDFS-1257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Ramkumar Vadali

 HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. 
 But it introduced unprotected access to the data structure 
 recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork 
 accesses recentInvalidateSets without read-lock protection. If there is 
 concurrent activity (like reducing replication on a file) that adds to 
 recentInvalidateSets, the name-node crashes with a 
 ConcurrentModificationException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1257) Race condition introduced by HADOOP-5124

2010-06-22 Thread Ramkumar Vadali (JIRA)
Race condition introduced by HADOOP-5124


 Key: HDFS-1257
 URL: https://issues.apache.org/jira/browse/HDFS-1257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Ramkumar Vadali


HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. 
But it introduced unprotected access to the data structure 
recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses 
recentInvalidateSets without read-lock protection. If there is concurrent 
activity (like reducing replication on a file) that adds to 
recentInvalidateSets, the name-node crashes with a 
ConcurrentModificationException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1175) HAR files used for RAID parity need to have configurable partfile size

2010-05-25 Thread Ramkumar Vadali (JIRA)
HAR files used for RAID parity need to have configurable partfile size
--

 Key: HDFS-1175
 URL: https://issues.apache.org/jira/browse/HDFS-1175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Priority: Minor


RAID parity files are merged into HAR archives periodically. This is required 
to reduce the number of files that the NameNode has to track. The number of 
files present in a HAR archive depends on the size of HAR part files - higher 
the size, lower the number of files.
The size of HAR part files is configurable through the setting 
har.partfile.size, but that is a global setting. This task introduces a new 
setting specific to raid.har.partfile.size, that is used in-turn to set 
har.partfile.size


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1175) HAR files used for RAID parity need to have configurable partfile size

2010-05-25 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1175:
--

Attachment: HDFS-1175.patch

 HAR files used for RAID parity need to have configurable partfile size
 --

 Key: HDFS-1175
 URL: https://issues.apache.org/jira/browse/HDFS-1175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Priority: Minor
 Attachments: HDFS-1175.patch


 RAID parity files are merged into HAR archives periodically. This is required 
 to reduce the number of files that the NameNode has to track. The number of 
 files present in a HAR archive depends on the size of HAR part files - higher 
 the size, lower the number of files.
 The size of HAR part files is configurable through the setting 
 har.partfile.size, but that is a global setting. This task introduces a new 
 setting specific to raid.har.partfile.size, that is used in-turn to set 
 har.partfile.size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1175) HAR files used for RAID parity need to have configurable partfile size

2010-05-25 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1175:
--

Status: Patch Available  (was: Open)

 HAR files used for RAID parity need to have configurable partfile size
 --

 Key: HDFS-1175
 URL: https://issues.apache.org/jira/browse/HDFS-1175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Priority: Minor
 Attachments: HDFS-1175.patch


 RAID parity files are merged into HAR archives periodically. This is required 
 to reduce the number of files that the NameNode has to track. The number of 
 files present in a HAR archive depends on the size of HAR part files - higher 
 the size, lower the number of files.
 The size of HAR part files is configurable through the setting 
 har.partfile.size, but that is a global setting. This task introduces a new 
 setting specific to raid.har.partfile.size, that is used in-turn to set 
 har.partfile.size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1171) RaidNode should fix missing blocks directly on Data Node

2010-05-21 Thread Ramkumar Vadali (JIRA)
RaidNode should fix missing blocks directly on Data Node


 Key: HDFS-1171
 URL: https://issues.apache.org/jira/browse/HDFS-1171
 Project: Hadoop HDFS
  Issue Type: Task
  Components: contrib/raid
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali


RaidNode currently does not fix missing blocks. The missing blocks have to be 
fixed manually.

This task proposes that recovery be more automated:
1. RaidNode periodically fetches a list of corrupt files from the NameNode
2. If the corrupt files has a RAID parity file, RaidNode identifies missing 
block(s) in the file and recomputes the block(s) using the parity file and 
other good blocks
3. RaidNode sends the generated block contents to a DataNode
   a. RaidNode chooses a DataNode with the most available space to send the 
block. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1055) Improve thread naming for DataXceivers

2010-04-26 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861067#action_12861067
 ] 

Ramkumar Vadali commented on HDFS-1055:
---

Hi Todd, your patch looks better. Do you plan to merge it soon?

 Improve thread naming for DataXceivers
 --

 Key: HDFS-1055
 URL: https://issues.apache.org/jira/browse/HDFS-1055
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Attachments: dataxceiver.patch, hdfs-1055-branch20.txt


 The DataXceiver threads are named using the default Daemon naming, which is 
 Runnable.toString(). Currently this isn't implemented, so threads have names 
 like org.apache.hadoop.hdfs.server.datanode.dataxcei...@579c9a6b. It would be 
 very handy for debugging (and even ops maybe) to have a better name like 
 DataXceiver for client 1.2.3.4 [reading block_234254242]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1055) Improve thread naming for DataXceivers

2010-04-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated HDFS-1055:
--

Attachment: dataxceiver.patch

 Improve thread naming for DataXceivers
 --

 Key: HDFS-1055
 URL: https://issues.apache.org/jira/browse/HDFS-1055
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Attachments: dataxceiver.patch


 The DataXceiver threads are named using the default Daemon naming, which is 
 Runnable.toString(). Currently this isn't implemented, so threads have names 
 like org.apache.hadoop.hdfs.server.datanode.dataxcei...@579c9a6b. It would be 
 very handy for debugging (and even ops maybe) to have a better name like 
 DataXceiver for client 1.2.3.4 [reading block_234254242]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.