[jira] Updated: (HDFS-1536) Improve HDFS WebUI
[ https://issues.apache.org/jira/browse/HDFS-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1536: Attachment: missingBlocksWebUI1.patch MissingBlocksWebUI1.patch addressed Nigel's comments. Improve HDFS WebUI -- Key: HDFS-1536 URL: https://issues.apache.org/jira/browse/HDFS-1536 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: missingBlocksWebUI.patch, missingBlocksWebUI1.patch 1. Make the missing blocks count accurate; 2. Make the under replicated blocks count excluding missing blocks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1521) Persist transaction ID on disk between NN restarts
[ https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1521: -- Attachment: hdfs-1521.3.txt This patch switches to logging a txid for every edit and verifying strict sequential ordering on load. I also left the txid in the header - it seemed to me this is advantageous just as something that *must* be there at the top of every edit file. If others disagree we can take it out. Added some basic tests as well to ensure we can still read the old format. Persist transaction ID on disk between NN restarts -- Key: HDFS-1521 URL: https://issues.apache.org/jira/browse/HDFS-1521 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: hdfs-1521.3.txt, hdfs-1521.txt, hdfs-1521.txt For HDFS-1073 and other future work, we'd like to have the concept of a transaction ID that is persisted on disk with the image/edits. We already have this concept in the NameNode but it resets to 0 on restart. We can also use this txid to replace the _checkpointTime_ field, I believe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1526) Dfs client name for a map/reduce task should have some randomness
[ https://issues.apache.org/jira/browse/HDFS-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1526: Resolution: Fixed Release Note: Make a client name has this format: DFSClient_applicationid_randomint_threadid, where applicationid = mapred.task.id or else = NONMAPREDUCE. Hadoop Flags: [Incompatible change, Reviewed] Status: Resolved (was: Patch Available) I just committed this! Dfs client name for a map/reduce task should have some randomness - Key: HDFS-1526 URL: https://issues.apache.org/jira/browse/HDFS-1526 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: clientName.patch, randClientId1.patch, randClientId2.patch, randClientId3.patch Fsck shows one of the files in our dfs cluster is corrupt. /bin/hadoop fsck aFile -files -blocks -locations aFile: 4633 bytes, 2 block(s): aFile: CORRUPT block blk_-4597378336099313975 OK 0. blk_-4597378336099313975_2284630101 len=0 repl=3 [...] 1. blk_5024052590403223424_2284630107 len=4633 repl=3 [...]Status: CORRUPT On disk, these two blocks are of the same size and the same content. It turns out the writer of the file is from a multiple threaded map task. Each thread may write to the same file. One possible interaction of two threads might make this to happen: [T1: create aFile] [T2: delete aFile] [T2: create aFile][T1: addBlock 0 to aFile][T2: addBlock1 to aFile]... Because T1 and T2 have the same client name, which is the map task id, the above interactions could be done without any lease exception, thus eventually leading to a corrupt file. To solve the problem, a mapreduce task's client name could be formed by its task id followed by a random number. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1360) TestBlockRecovery should bind ephemeral ports
[ https://issues.apache.org/jira/browse/HDFS-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971407#action_12971407 ] Patrick Kling commented on HDFS-1360: - +1 I just tested this and it fixes the problem I was seeing. TestBlockRecovery should bind ephemeral ports - Key: HDFS-1360 URL: https://issues.apache.org/jira/browse/HDFS-1360 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-1360.txt TestBlockRecovery starts up a DN, but doesn't configure the various ports to be ephemeral, so the test fails if run on a machine where another DN is already running. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode
[ https://issues.apache.org/jira/browse/HDFS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1476: Resolution: Fixed Status: Resolved (was: Patch Available) I've just committed this. Thanks Patrick! listCorruptFileBlocks should be functional while the name node is still in safe mode Key: HDFS-1476 URL: https://issues.apache.org/jira/browse/HDFS-1476 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Patrick Kling Assignee: Patrick Kling Fix For: 0.23.0 Attachments: HDFS-1476.2.patch, HDFS-1476.3.patch, HDFS-1476.4.patch, HDFS-1476.5.patch, HDFS-1476.patch This would allow us to detect whether missing blocks can be fixed using Raid and if that is the case exit safe mode earlier. One way to make listCorruptFileBlocks available before the name node has exited from safe mode would be to perform a scan of the blocks map on each call to listCorruptFileBlocks to determine if there are any blocks with no replicas. This scan could be parallelized by dividing the space of block IDs into multiple intervals than can be scanned independently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1360) TestBlockRecovery should bind ephemeral ports
[ https://issues.apache.org/jira/browse/HDFS-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1360: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. Thanks Todd! TestBlockRecovery should bind ephemeral ports - Key: HDFS-1360 URL: https://issues.apache.org/jira/browse/HDFS-1360 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-1360.txt TestBlockRecovery starts up a DN, but doesn't configure the various ports to be ephemeral, so the test fails if run on a machine where another DN is already running. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1360) TestBlockRecovery should bind ephemeral ports
[ https://issues.apache.org/jira/browse/HDFS-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1360: Fix Version/s: 0.23.0 TestBlockRecovery should bind ephemeral ports - Key: HDFS-1360 URL: https://issues.apache.org/jira/browse/HDFS-1360 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.23.0 Attachments: hdfs-1360.txt TestBlockRecovery starts up a DN, but doesn't configure the various ports to be ephemeral, so the test fails if run on a machine where another DN is already running. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1537) Add a metrics for tracking the number of reported corrupt replicas
Add a metrics for tracking the number of reported corrupt replicas -- Key: HDFS-1537 URL: https://issues.apache.org/jira/browse/HDFS-1537 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 We have a cluster, some of its datanodes' disks are corrupt. But it tooks us a few days to be aware of the problem. Adding a metrics that keeps track of the number of reported corrupt replicas would allow us to have an alert when unusual number of corrupt replicas are reported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Steffl updated HDFS-1448: -- Attachment: HDFS-1448-0.22-4.patch Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971486#action_12971486 ] Erik Steffl commented on HDFS-1448: --- HDFS-1448-0.22-4.patch address the points in review from 09/Dec/10 08:06 PM https://issues.apache.org/jira/browse/HDFS-1448?focusedCommentId=12970037page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12970037 Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1477) Make NameNode Reconfigurable.
[ https://issues.apache.org/jira/browse/HDFS-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Kling updated HDFS-1477: Attachment: HDFS-1477.2.patch Updated patch, added unit test. ant test-patch output: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. {code} ant test failures (all these tests also fail on a clean trunk for me): {code} [junit] Test org.apache.hadoop.hdfs.TestHDFSServerPorts FAILED [junit] Test org.apache.hadoop.hdfs.TestHDFSTrash FAILED (timeout) [junit] Test org.apache.hadoop.hdfs.server.namenode.TestBackupNode FAILED [junit] Test org.apache.hadoop.hdfs.server.namenode.TestStorageRestore FAILED [junit] Test org.apache.hadoop.hdfs.TestFileConcurrentReader FAILED (timeout) {code} Make NameNode Reconfigurable. - Key: HDFS-1477 URL: https://issues.apache.org/jira/browse/HDFS-1477 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Patrick Kling Fix For: 0.23.0 Attachments: HDFS-1477.2.patch, HDFS-1477.patch Modify NameNode to implement the interface Reconfigurable proposed in HADOOP-7001. This would allow us to change certain configuration properties without restarting the name node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1477) Make NameNode Reconfigurable.
[ https://issues.apache.org/jira/browse/HDFS-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971535#action_12971535 ] Jakob Homan commented on HDFS-1477: --- The addInternalServlet call is used for machine-to-machine servlets that users won't use and are not authenticated (or are authenticated over kerberos). This servlet should use the standard add call so that in a secure system the user will be authenticated. Haven't reviewed the rest of the patch; just wanted to throw that out. Make NameNode Reconfigurable. - Key: HDFS-1477 URL: https://issues.apache.org/jira/browse/HDFS-1477 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Patrick Kling Assignee: Patrick Kling Fix For: 0.23.0 Attachments: HDFS-1477.2.patch, HDFS-1477.patch Modify NameNode to implement the interface Reconfigurable proposed in HADOOP-7001. This would allow us to change certain configuration properties without restarting the name node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-733) TestBlockReport fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971559#action_12971559 ] Todd Lipcon commented on HDFS-733: -- Still happening occasionally in trunk, same error as Eli posted above. TestBlockReport fails intermittently Key: HDFS-733 URL: https://issues.apache.org/jira/browse/HDFS-733 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: Suresh Srinivas Assignee: Konstantin Boudnik Fix For: 0.22.0 Attachments: HDFS-733.2.patch, HDFS-733.patch, HDFS-733.patch, HDFS-733.patch, HDFS-733.patch Details at http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/58/testReport/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1521) Persist transaction ID on disk between NN restarts
[ https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971560#action_12971560 ] Todd Lipcon commented on HDFS-1521: --- Current patch that I uploaded yesterday passes test-patch and unit tests, though there are one or two pretty trivial TODOs left in the patch, so I need to upload a final one with those addressed. Before I do so, any review comments? Persist transaction ID on disk between NN restarts -- Key: HDFS-1521 URL: https://issues.apache.org/jira/browse/HDFS-1521 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: hdfs-1521.3.txt, hdfs-1521.txt, hdfs-1521.txt For HDFS-1073 and other future work, we'd like to have the concept of a transaction ID that is persisted on disk with the image/edits. We already have this concept in the NameNode but it resets to 0 on restart. We can also use this txid to replace the _checkpointTime_ field, I believe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-733) TestBlockReport fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971563#action_12971563 ] Konstantin Boudnik commented on HDFS-733: - it sure does. Seems like on some occasions the replica either needs more time to become TEMP. or this happens too fast before checking thread even kicks in... Damn nasty problem. TestBlockReport fails intermittently Key: HDFS-733 URL: https://issues.apache.org/jira/browse/HDFS-733 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: Suresh Srinivas Assignee: Konstantin Boudnik Fix For: 0.22.0 Attachments: HDFS-733.2.patch, HDFS-733.patch, HDFS-733.patch, HDFS-733.patch, HDFS-733.patch Details at http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/58/testReport/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.