[jira] Updated: (HDFS-1461) Refactor hdfs.server.datanode.BlockSender
[ https://issues.apache.org/jira/browse/HDFS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1461: -- Attachment: HDFS-1461.patch This patch introduces a RAID-friendly constructor for {{BlockSender}}. This constructor does not need a {{DataNode}} object, and works with streams instead. The common code between the new and old constructors is refactored to a function {{initialize()}} I ran the HDFS unit tests and I did not see any new failure from trunk. Refactor hdfs.server.datanode.BlockSender - Key: HDFS-1461 URL: https://issues.apache.org/jira/browse/HDFS-1461 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Attachments: HDFS-1461.patch BlockSender provides the functionality to send a block to a data node. But the current implementation requires the source of the block to be a data node. The RAID contrib project needs the functionality of sending a block to a data node, but cannot use hdfs.server.datanode.BlockSender because the constructor requires a datanode object. MAPREDUCE-2132 provides the motivation for this. The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to have another constructor that does not need a DataNode object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1461) Refactor hdfs.server.datanode.BlockSender
[ https://issues.apache.org/jira/browse/HDFS-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1461: -- Description: BlockSender provides the functionality to send a block to a data node. But the current implementation requires the source of the block to be a data node. The RAID contrib project needs the functionality of sending a block to a data node, but cannot use hdfs.server.datanode.BlockSender because the constructor requires a datanode object. MAPREDUCE-2132 provides the motivation for this. The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to have another constructor that does not need a DataNode object. was: BlockSender provides the functionality to send a block to a data node. But the current implementation requires the source of the block to be a data node. The RAID contrib project needs the functionality of sending a block to a data node, but cannot use hdfs.server.datanode.BlockSender because the constructor requires a datanode object. https://issues.apache.org/jira/browse/MAPREDUCE-2132 provides the motivation for this. The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to have another constructor that does not need a DataNode object. Refactor hdfs.server.datanode.BlockSender - Key: HDFS-1461 URL: https://issues.apache.org/jira/browse/HDFS-1461 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali BlockSender provides the functionality to send a block to a data node. But the current implementation requires the source of the block to be a data node. The RAID contrib project needs the functionality of sending a block to a data node, but cannot use hdfs.server.datanode.BlockSender because the constructor requires a datanode object. MAPREDUCE-2132 provides the motivation for this. The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to have another constructor that does not need a DataNode object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124
[ https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934616#action_12934616 ] Ramkumar Vadali commented on HDFS-1257: --- Hi Konstantin, sorry for the delay in getting back to this. It seems difficult to come up with a general solution to this since some methods in {{BlockManager}} do fine grained locking with {{namesystem.readLock()/writeLock()}}. In particular, the call to {{BlockManager.computeReplicationWork}} that you referred to seems safe because of locking inside {{BlockManager.computeReplicationWorkForBlock()}}. BlockManager has several calls to {{namesystem.readLock()}} and {{namesystem.writeLock()}} apart from the one I mention. Are you suggesting a restructuring of those calls? Race condition introduced by HADOOP-5124 Key: HDFS-1257 URL: https://issues.apache.org/jira/browse/HDFS-1257 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Ramkumar Vadali Attachments: HDFS-1257.patch HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1457) Limit transmission rate when transfering image between primary and secondary NNs
[ https://issues.apache.org/jira/browse/HDFS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928736#action_12928736 ] Ramkumar Vadali commented on HDFS-1457: --- @Hairong, I see a lot of release audit warnings in a clean MR checkout too. I think this is due to HADOOP-7008. Please see MAPREDUCE-2172 for this. Limit transmission rate when transfering image between primary and secondary NNs Key: HDFS-1457 URL: https://issues.apache.org/jira/browse/HDFS-1457 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: checkpoint-limitandcompress.patch, trunkThrottleImage.patch, trunkThrottleImage1.patch If the fsimage is very big. The network is full in a short time when SeconaryNamenode do checkpoint, leading to Jobtracker access Namenode to get relevant file data to fail in job initialization phase. So we limit transmission speed and compress transmission to resolve the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1171) RaidNode should fix missing blocks directly on Data Node
[ https://issues.apache.org/jira/browse/HDFS-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali resolved HDFS-1171. --- Resolution: Invalid Recreating this in Map/Reduce: MAPREDUCE-2150 RaidNode should fix missing blocks directly on Data Node Key: HDFS-1171 URL: https://issues.apache.org/jira/browse/HDFS-1171 Project: Hadoop HDFS Issue Type: Task Components: contrib/raid Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali RaidNode currently does not fix missing blocks. The missing blocks have to be fixed manually. This task proposes that recovery be more automated: 1. RaidNode periodically fetches a list of corrupt files from the NameNode 2. If the corrupt files has a RAID parity file, RaidNode identifies missing block(s) in the file and recomputes the block(s) using the parity file and other good blocks 3. RaidNode sends the generated block contents to a DataNode a. RaidNode chooses a DataNode with the most available space to send the block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1472) Refactor DFSck to allow programmatic access to output
[ https://issues.apache.org/jira/browse/HDFS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924067#action_12924067 ] Ramkumar Vadali commented on HDFS-1472: --- Test results: ant test-patch: [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to i [exec] nclude 2 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system tests framework. The patch passed system tests framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == ant test: Some tests failed, but I verified that these fail in a clean checkout as well. [junit] Test org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery FAILED [junit] Test org.apache.hadoop.hdfs.TestFileStatus FAILED [junit] Test org.apache.hadoop.hdfs.TestHDFSTrash FAILED (timeout) [junit] Test org.apache.hadoop.fs.TestHDFSFileContextMainOperations FAILED [junit] Test org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery FAILED Refactor DFSck to allow programmatic access to output - Key: HDFS-1472 URL: https://issues.apache.org/jira/browse/HDFS-1472 Project: Hadoop HDFS Issue Type: Improvement Components: tools Reporter: Ramkumar Vadali Attachments: HDFS-1472.patch DFSck prints the list of corrupt files to stdout. This jira proposes that it write to a PrintStream object that is passed to the constructor. This will allow components like RAID to programmatically get a list of corrupt files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1472) Refactor DFSck to allow programmatic access to output
Refactor DFSck to allow programmatic access to output - Key: HDFS-1472 URL: https://issues.apache.org/jira/browse/HDFS-1472 Project: Hadoop HDFS Issue Type: Improvement Components: tools Reporter: Ramkumar Vadali DFSck prints the list of corrupt files to stdout. This jira proposes that it write to a PrintStream object that is passed to the constructor. This will allow components like RAID to programmatically get a list of corrupt files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1472) Refactor DFSck to allow programmatic access to output
[ https://issues.apache.org/jira/browse/HDFS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1472: -- Attachment: HDFS-1472.patch Adds a constructor DFSck(Configuration, PrintStream). This is better than modifying System.out. Refactor DFSck to allow programmatic access to output - Key: HDFS-1472 URL: https://issues.apache.org/jira/browse/HDFS-1472 Project: Hadoop HDFS Issue Type: Improvement Components: tools Reporter: Ramkumar Vadali Attachments: HDFS-1472.patch DFSck prints the list of corrupt files to stdout. This jira proposes that it write to a PrintStream object that is passed to the constructor. This will allow components like RAID to programmatically get a list of corrupt files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1472) Refactor DFSck to allow programmatic access to output
[ https://issues.apache.org/jira/browse/HDFS-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1472: -- Status: Patch Available (was: Open) Refactor DFSck to allow programmatic access to output - Key: HDFS-1472 URL: https://issues.apache.org/jira/browse/HDFS-1472 Project: Hadoop HDFS Issue Type: Improvement Components: tools Reporter: Ramkumar Vadali Attachments: HDFS-1472.patch DFSck prints the list of corrupt files to stdout. This jira proposes that it write to a PrintStream object that is passed to the constructor. This will allow components like RAID to programmatically get a list of corrupt files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1461) Refactor hdfs.server.datanode.BlockSender
Refactor hdfs.server.datanode.BlockSender - Key: HDFS-1461 URL: https://issues.apache.org/jira/browse/HDFS-1461 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Ramkumar Vadali BlockSender provides the functionality to send a block to a data node. But the current implementation requires the source of the block to be a data node. The RAID contrib project needs the functionality of sending a block to a data node, but cannot use hdfs.server.datanode.BlockSender because the constructor requires a datanode object. https://issues.apache.org/jira/browse/MAPREDUCE-2132 provides the motivation for this. The purpose of this jira is to refactor hdfs.server.datanode.BlockSender to have another constructor that does not need a DataNode object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1453) Need a command line option in RaidShell to fix blocks using raid
[ https://issues.apache.org/jira/browse/HDFS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali resolved HDFS-1453. --- Resolution: Invalid RAID is a MR project, will reopen this under MR. Need a command line option in RaidShell to fix blocks using raid Key: HDFS-1453 URL: https://issues.apache.org/jira/browse/HDFS-1453 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali RaidShell currently has an option to recover a file and return the path to the recovered file. The administrator can then rename the recovered file to the damaged file. The problem with this is that the file metadata is altered, specifically the modification time. Instead we need a way to just repair the damaged blocks and send the fixed blocks to a data node. Once this is done, we can put automation around it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1453) Need a command line option in RaidShell to fix blocks using raid
Need a command line option in RaidShell to fix blocks using raid Key: HDFS-1453 URL: https://issues.apache.org/jira/browse/HDFS-1453 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali RaidShell currently has an option to recover a file and return the path to the recovered file. The administrator can then rename the recovered file to the damaged file. The problem with this is that the file metadata is altered, specifically the modification time. Instead we need a way to just repair the damaged blocks and send the fixed blocks to a data node. Once this is done, we can put automation around it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS
[ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917481#action_12917481 ] Ramkumar Vadali commented on HDFS-503: -- @shravankumar, to get a basic idea of HDFS RAID, you can read up Dhruba's blog post http://hadoopblog.blogspot.com/2009/08/hdfs-and-erasure-codes-hdfs-raid.html If you need this for demo purposes, could you use the current hadoop trunk? I am not sure about the exact date of the next release. To use RAID, you need to create a configuration file and start the RAID daemon. You can look for examples in the unit tests, say TestRaidNode. For further communication, you can contact me directly. Implement erasure coding as a layer on HDFS --- Key: HDFS-503 URL: https://issues.apache.org/jira/browse/HDFS-503 Project: Hadoop HDFS Issue Type: New Feature Components: contrib/raid Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.21.0 Attachments: raid1.txt, raid2.txt The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before. Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt. My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS
[ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916677#action_12916677 ] Ramkumar Vadali commented on HDFS-503: -- @shravankumar Quite a few bugs in raid have been fixed in trunk. This will be part of the upcoming release hadoop-0.22. What do you mean by raid API? Implement erasure coding as a layer on HDFS --- Key: HDFS-503 URL: https://issues.apache.org/jira/browse/HDFS-503 Project: Hadoop HDFS Issue Type: New Feature Components: contrib/raid Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.21.0 Attachments: raid1.txt, raid2.txt The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before. Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt. My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904431#action_12904431 ] Ramkumar Vadali commented on HDFS-: --- The RaidNode use case at a high level is to identify corrupted data that can be fixed by using parity data. This can be achieved by: 1. Getting a list of corrupt files and subsequently identifying the corrupt blocks in each corrupt file. The current getCorruptFiles() RPC enables getting the list of corrupt files. -OR- 2. Getting a list of corrupt files annotated by the corrupt blocks. If this patch introduced a RPC with that functionality, it would be an improvement over the getCorruptFiles() RPC. I have a patch for https://issues.apache.org/jira/browse/HDFS-1171 that depends on the getCorruptFiles() RPC, so removal of that RPC with no substitute would mean loss of functionality. getCorruptFiles() should give some hint that the list is not complete - Key: HDFS- URL: https://issues.apache.org/jira/browse/HDFS- Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Sriram Rao Fix For: 0.22.0 Attachments: HADFS-.0.patch, HDFS--y20.1.patch, HDFS--y20.2.patch, HDFS-.trunk.patch If the list of corruptfiles returned by the namenode doesn't say anything if the number of corrupted files is larger than the call output limit (which means the list is not complete). There should be a way to hint incompleteness to clients. A simple hack would be to add an extra entry to the array returned with the value null. Clients could interpret this as a sign that there are other corrupt files in the system. We should also do some rephrasing of the fsck output to make it more confident when the list is not complete and less confident when the list is known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1257) Race condition introduced by HADOOP-5124
[ https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1257: -- Attachment: HDFS-1257.patch Use protected access as suggested by Hairong. Race condition introduced by HADOOP-5124 Key: HDFS-1257 URL: https://issues.apache.org/jira/browse/HDFS-1257 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Ramkumar Vadali Attachments: HDFS-1257.patch HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124
[ https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882228#action_12882228 ] Ramkumar Vadali commented on HDFS-1257: --- I will try to reproduce this with a unit-test, and will update with the results. Race condition introduced by HADOOP-5124 Key: HDFS-1257 URL: https://issues.apache.org/jira/browse/HDFS-1257 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Ramkumar Vadali HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124
[ https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881884#action_12881884 ] Ramkumar Vadali commented on HDFS-1257: --- I am quite sure it is not a case of modifying while iterating over the collection. ConcurrentModificationException is typically seen when a collection is modified while iterating on it,but that is not the case here. This is a case of two threads actually performing read/write without protection. The collection is not guaranteed to catch the multi-threaded case, but I have seen it happen. Race condition introduced by HADOOP-5124 Key: HDFS-1257 URL: https://issues.apache.org/jira/browse/HDFS-1257 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Ramkumar Vadali HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1257) Race condition introduced by HADOOP-5124
[ https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881893#action_12881893 ] Ramkumar Vadali commented on HDFS-1257: --- My proposal is to wrap FSNamesystem#recentInvalidateSets in Collections.synchronizedMap(). That should fix this problem. --- a/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java +++ b/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java @@ -189,7 +189,7 @@ public class FSNamesystem implements FSConstants, FSNamesystemMBean, FSClusterSt // Mapping: StorageID - ArrayListBlock // private MapString, CollectionBlock recentInvalidateSets = -new TreeMapString, CollectionBlock(); +Collections.synchronizedMap(new TreeMapString, CollectionBlock()); // // Keeps a TreeSet for every named node. Each treeset contains Race condition introduced by HADOOP-5124 Key: HDFS-1257 URL: https://issues.apache.org/jira/browse/HDFS-1257 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Ramkumar Vadali HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1257) Race condition introduced by HADOOP-5124
Race condition introduced by HADOOP-5124 Key: HDFS-1257 URL: https://issues.apache.org/jira/browse/HDFS-1257 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Ramkumar Vadali HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1175) HAR files used for RAID parity need to have configurable partfile size
HAR files used for RAID parity need to have configurable partfile size -- Key: HDFS-1175 URL: https://issues.apache.org/jira/browse/HDFS-1175 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/raid Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Priority: Minor RAID parity files are merged into HAR archives periodically. This is required to reduce the number of files that the NameNode has to track. The number of files present in a HAR archive depends on the size of HAR part files - higher the size, lower the number of files. The size of HAR part files is configurable through the setting har.partfile.size, but that is a global setting. This task introduces a new setting specific to raid.har.partfile.size, that is used in-turn to set har.partfile.size -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1175) HAR files used for RAID parity need to have configurable partfile size
[ https://issues.apache.org/jira/browse/HDFS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1175: -- Attachment: HDFS-1175.patch HAR files used for RAID parity need to have configurable partfile size -- Key: HDFS-1175 URL: https://issues.apache.org/jira/browse/HDFS-1175 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/raid Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Priority: Minor Attachments: HDFS-1175.patch RAID parity files are merged into HAR archives periodically. This is required to reduce the number of files that the NameNode has to track. The number of files present in a HAR archive depends on the size of HAR part files - higher the size, lower the number of files. The size of HAR part files is configurable through the setting har.partfile.size, but that is a global setting. This task introduces a new setting specific to raid.har.partfile.size, that is used in-turn to set har.partfile.size -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1175) HAR files used for RAID parity need to have configurable partfile size
[ https://issues.apache.org/jira/browse/HDFS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1175: -- Status: Patch Available (was: Open) HAR files used for RAID parity need to have configurable partfile size -- Key: HDFS-1175 URL: https://issues.apache.org/jira/browse/HDFS-1175 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/raid Affects Versions: 0.20.1 Reporter: Ramkumar Vadali Priority: Minor Attachments: HDFS-1175.patch RAID parity files are merged into HAR archives periodically. This is required to reduce the number of files that the NameNode has to track. The number of files present in a HAR archive depends on the size of HAR part files - higher the size, lower the number of files. The size of HAR part files is configurable through the setting har.partfile.size, but that is a global setting. This task introduces a new setting specific to raid.har.partfile.size, that is used in-turn to set har.partfile.size -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1171) RaidNode should fix missing blocks directly on Data Node
RaidNode should fix missing blocks directly on Data Node Key: HDFS-1171 URL: https://issues.apache.org/jira/browse/HDFS-1171 Project: Hadoop HDFS Issue Type: Task Components: contrib/raid Affects Versions: 0.20.1 Reporter: Ramkumar Vadali RaidNode currently does not fix missing blocks. The missing blocks have to be fixed manually. This task proposes that recovery be more automated: 1. RaidNode periodically fetches a list of corrupt files from the NameNode 2. If the corrupt files has a RAID parity file, RaidNode identifies missing block(s) in the file and recomputes the block(s) using the parity file and other good blocks 3. RaidNode sends the generated block contents to a DataNode a. RaidNode chooses a DataNode with the most available space to send the block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1055) Improve thread naming for DataXceivers
[ https://issues.apache.org/jira/browse/HDFS-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861067#action_12861067 ] Ramkumar Vadali commented on HDFS-1055: --- Hi Todd, your patch looks better. Do you plan to merge it soon? Improve thread naming for DataXceivers -- Key: HDFS-1055 URL: https://issues.apache.org/jira/browse/HDFS-1055 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: dataxceiver.patch, hdfs-1055-branch20.txt The DataXceiver threads are named using the default Daemon naming, which is Runnable.toString(). Currently this isn't implemented, so threads have names like org.apache.hadoop.hdfs.server.datanode.dataxcei...@579c9a6b. It would be very handy for debugging (and even ops maybe) to have a better name like DataXceiver for client 1.2.3.4 [reading block_234254242] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1055) Improve thread naming for DataXceivers
[ https://issues.apache.org/jira/browse/HDFS-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated HDFS-1055: -- Attachment: dataxceiver.patch Improve thread naming for DataXceivers -- Key: HDFS-1055 URL: https://issues.apache.org/jira/browse/HDFS-1055 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: dataxceiver.patch The DataXceiver threads are named using the default Daemon naming, which is Runnable.toString(). Currently this isn't implemented, so threads have names like org.apache.hadoop.hdfs.server.datanode.dataxcei...@579c9a6b. It would be very handy for debugging (and even ops maybe) to have a better name like DataXceiver for client 1.2.3.4 [reading block_234254242] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.