[jira] [Commented] (HDFS-2305) Running multiple 2NNs can result in corrupt file system
[ https://issues.apache.org/jira/browse/HDFS-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095810#comment-13095810 ] Hadoop QA commented on HDFS-2305: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492676/hdfs-2305.0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1186//console This message is automatically generated. Running multiple 2NNs can result in corrupt file system --- Key: HDFS-2305 URL: https://issues.apache.org/jira/browse/HDFS-2305 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: hdfs-2305-test.patch, hdfs-2305.0.patch Here's the scenario: * You run the NN and 2NN (2NN A) on the same machine. * You don't have the address of the 2NN configured, so it's defaulting to 127.0.0.1. * There's another 2NN (2NN B) running on a second machine. * When a 2NN is done checkpointing, it says hey NN, I have an updated fsimage for you. You can download it from this URL, which includes my IP address, which is x And here's the steps that occur to cause this issue: # Some edits happen. # 2NN A (on the NN machine) does a checkpoint. All is dandy. # Some more edits happen. # 2NN B (on a different machine) does a checkpoint. It tells the NN grab the newly-merged fsimage file from 127.0.0.1 # NN happily grabs the fsimage from 2NN A (the 2NN on the NN machine), which is stale. # NN renames edits.new file to edits. At this point the in-memory FS state is fine, but the on-disk state is missing edits. # The next time a 2NN (any 2NN) tries to do a checkpoint, it gets an up-to-date edits file, with an outdated fsimage, and tries to apply those edits to that fsimage. # Kaboom. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095812#comment-13095812 ] Hadoop QA commented on HDFS-2299: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492646/HDFS-2299.1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1187//console This message is automatically generated. TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
[ https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-1973: - Attachment: hdfs-1973.0.patch Here's a preliminary patch (not intended for commit) to give people an idea how this will work. The main things this is missing before it could reasonably be committed are: # It currently doesn't handle clean-up of fail-over client resources at all. The way RPC resource cleanup currently works is by looking up the appropriate RPCEngine given a protocol class, and leaving it up to that class's InvocationHandler. This implicitly assumes that there is a one-to-one mapping from protocol class - invocation handler, which is no longer true. It's not obvious to me at the moment what's the best way to deal with this. # Currently only one of the {{ClientProtocol}} methods is annotated with the @Idempotent annotation. # It currently doesn't handle concurrent connections at all. HA: HDFS clients must handle namenode failover and switch over to the new active namenode. -- Key: HDFS-1973 URL: https://issues.apache.org/jira/browse/HDFS-1973 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Suresh Srinivas Assignee: Aaron T. Myers Attachments: hdfs-1973.0.patch During failover, a client must detect the current active namenode failure and switch over to the new active namenode. The switch over might make use of IP failover or some thing more elaborate such as zookeeper to discover the new active. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095944#comment-13095944 ] Hudson commented on HDFS-2299: -- Integrated in Hadoop-Hdfs-trunk #780 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/780/]) HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G via atm) atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164192 Files : * /hadoop/common/trunk/dev-support/test-patch.properties * /hadoop/common/trunk/hadoop-hdfs-project/dev-support/test-patch.properties * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095956#comment-13095956 ] Hudson commented on HDFS-2299: -- Integrated in Hadoop-Mapreduce-trunk #804 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/804/]) HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G via atm) atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164192 Files : * /hadoop/common/trunk/dev-support/test-patch.properties * /hadoop/common/trunk/hadoop-hdfs-project/dev-support/test-patch.properties * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2299: -- Status: Open (was: Patch Available) TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2299: -- Attachment: HDFS-2299.2.patch TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096013#comment-13096013 ] Uma Maheswara Rao G commented on HDFS-2299: --- Hey Aaron, Yes, I ran the test-patch.It did not show any warnings. It looks to me that we need not even add exclude tag in RAT configuration item.Because HDFS already has that excluded. Because of that it did not show any warnings if i add/don't add in common.adding exclude tag is no use in common. In HDFS pom.xml: {code} excludesrc/test/resources/data*/exclude excludesrc/test/resources/editStored*/exclude excludesrc/test/resources/empty-file/exclude {code} I just removed the Apache header from editStored.xml. Test-Patch results: +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks Uma TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2012) Recurring failure of TestBalancer on branch-0.22
[ https://issues.apache.org/jira/browse/HDFS-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096026#comment-13096026 ] Uma Maheswara Rao G commented on HDFS-2012: --- Hi Konstantin/Aaron, It is passing in my local box. Now also this test failing in your local box ? Thanks Uma Recurring failure of TestBalancer on branch-0.22 Key: HDFS-2012 URL: https://issues.apache.org/jira/browse/HDFS-2012 Project: Hadoop HDFS Issue Type: Bug Components: balancer, test Affects Versions: 0.22.0 Reporter: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 This has been failing on Hudson for the last two builds and fails on my local box as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2302) HDFS logs not being rotated
[ https://issues.apache.org/jira/browse/HDFS-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash resolved HDFS-2302. Resolution: Invalid In commit 8b0430307b662a1533686aeefa0760380b7c5182 the logs are being automated. Marking as invalid HDFS logs not being rotated --- Key: HDFS-2302 URL: https://issues.apache.org/jira/browse/HDFS-2302 Project: Hadoop HDFS Issue Type: Bug Reporter: Ravi Prakash In commit c5edca2b15eca7c0bd568a0017f699ac91b8aebf, the logs for the namenode, datanode and secondarynamenode are being written to .out files and are not being rotated after one day. IMHO rotation of logs is important -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2302) HDFS logs not being rotated
[ https://issues.apache.org/jira/browse/HDFS-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096035#comment-13096035 ] Ravi Prakash commented on HDFS-2302: *rotated I meant HDFS logs not being rotated --- Key: HDFS-2302 URL: https://issues.apache.org/jira/browse/HDFS-2302 Project: Hadoop HDFS Issue Type: Bug Reporter: Ravi Prakash In commit c5edca2b15eca7c0bd568a0017f699ac91b8aebf, the logs for the namenode, datanode and secondarynamenode are being written to .out files and are not being rotated after one day. IMHO rotation of logs is important -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1161) Make DN minimum valid volumes configurable
[ https://issues.apache.org/jira/browse/HDFS-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096099#comment-13096099 ] Eli Collins commented on HDFS-1161: --- That's reasonable. The list of volumes (local dirs) is explicitly listed so the config isn't portable even when specified as a percent, but it's one less config that isn't portable. IIRC Koji's perspective was that an admin doesn't want to specify the count or percent of valid volumes, but that after a set number of failures the host should be considered faulty. Eg if it's lost two disks there's probably something wrong whether the host has 6 or 12 disks, ie assumes disk failures w/in a host are correlated. Ideally I think we should collect data (eg an X core host can still function well with Y% disks) and not require users configure this at all - it would be enabled by default and the daemons would take themselves offline when they've determined they don't have sufficient resources. Make DN minimum valid volumes configurable -- Key: HDFS-1161 URL: https://issues.apache.org/jira/browse/HDFS-1161 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Blocker Fix For: 0.21.0 Attachments: HDFS-1161-y20.patch, hdfs-1161-1.patch, hdfs-1161-2.patch, hdfs-1161-3.patch, hdfs-1161-4.patch, hdfs-1161-5.patch, hdfs-1161-6.patch The minimum number of non-faulty volumes to keep the DN active is hard-coded to 1. It would be useful to allow users to configure this value so the DN can be taken offline when eg half of its disks fail, otherwise it doesn't get reported until it's down to it's final disk and suffering degraded performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-2299. -- Resolution: Fixed Thanks a lot for reworking the patch, Uma. I've just committed the latest one. TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096121#comment-13096121 ] Hudson commented on HDFS-2299: -- Integrated in Hadoop-Common-trunk-Commit #824 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/824/]) HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G via atm) atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164624 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096122#comment-13096122 ] Hudson commented on HDFS-2299: -- Integrated in Hadoop-Hdfs-trunk-Commit #901 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/901/]) HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G via atm) atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164624 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2284) RW Http access to HDFS
[ https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2284: - Attachment: h2284_20110902b.patch h2284_20110902b.patch: A patch for preview. It only has HttpFileSystem (httpfs://) with mkdirs and getFileStatus. RW Http access to HDFS -- Key: HDFS-2284 URL: https://issues.apache.org/jira/browse/HDFS-2284 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Assignee: Tsz Wo (Nicholas), SZE Attachments: h2284_20110902b.patch HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2299) TestOfflineEditsViewer is failing on trunk
[ https://issues.apache.org/jira/browse/HDFS-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096125#comment-13096125 ] Hudson commented on HDFS-2299: -- Integrated in Hadoop-Mapreduce-trunk-Commit #834 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/834/]) HDFS-2299. TestOfflineEditsViewer is failing on trunk. (Uma Maheswara Rao G via atm) atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1164624 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml TestOfflineEditsViewer is failing on trunk -- Key: HDFS-2299 URL: https://issues.apache.org/jira/browse/HDFS-2299 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.24.0 Reporter: Aaron T. Myers Assignee: Uma Maheswara Rao G Fix For: 0.24.0 Attachments: HDFS-2299.1.patch, HDFS-2299.2.patch, HDFS-2299.patch, HDFS-2299.patch The relevant bit of the error: {noformat} --- Test set: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.652 sec FAILURE! testStored(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 0.038 sec FAILURE! java.lang.AssertionError: Reference XML edits and parsed to XML should be same {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2307) More Coverage needed for FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-2307: -- Fix Version/s: 0.22.0 More Coverage needed for FSDirectory Key: HDFS-2307 URL: https://issues.apache.org/jira/browse/HDFS-2307 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.22.0 Reporter: Benoy Antony Fix For: 0.22.0 Attachments: 59.html The unit tests do not cover some of the symlink logic in FSDirectory. The impact of adding a symlink on the nameQuota is not covered. The unit test coverage for FSDirectory is attached. The uncovered lines are in addToParent function. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2284) RW Http access to HDFS
[ https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096152#comment-13096152 ] Todd Lipcon commented on HDFS-2284: --- If this is HDFS-specific, could we make the scheme something like hdfs+http:// or http+hdfs:// to indicate the encapsulation? I always found hftp://; to be very confusing to users who thought it had something to do with FTP. I can see users being equally confused if they try to do hadoop fs -cat httpfs://myserver/path/to/tarball.tgz. RW Http access to HDFS -- Key: HDFS-2284 URL: https://issues.apache.org/jira/browse/HDFS-2284 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Assignee: Tsz Wo (Nicholas), SZE Attachments: h2284_20110902b.patch HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.
[ https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1779: Attachment: bbwReportAppend.patch Here comes the patch: 1. It adds a new RPC blocksBeingWrittenReport that allows datanode to send a report of bbw blocks. 2. DataNode sends a bbw block report when registering with NameNode 3. FSDatasetInterface is enhanced with getBlocksBeingWrittenReport API. After NameNode restart , Clients can not read partial files even after client invokes Sync. --- Key: HDFS-1779 URL: https://issues.apache.org/jira/browse/HDFS-1779 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.20-append Environment: Linux Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 0.20-append Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch In Append HDFS-200 issue, If file has 10 blocks and after writing 5 blocks if client invokes sync method then NN will persist the blocks information in edits. After this if we restart the NN, All the DataNodes will reregister with NN. But DataNodes are not sending the blocks being written information to NN. DNs are sending the blocksBeingWritten information in DN startup. So, here NameNode can not find that the 5 persisted blocks belongs to which datanodes. This information can build based on block reports from DN. Otherwise we will loose this 5 blocks information even NN persisted that block information in edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.
[ https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096163#comment-13096163 ] Hadoop QA commented on HDFS-1779: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492778/bbwReportAppend.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1188//console This message is automatically generated. After NameNode restart , Clients can not read partial files even after client invokes Sync. --- Key: HDFS-1779 URL: https://issues.apache.org/jira/browse/HDFS-1779 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.20-append Environment: Linux Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 0.20-append Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch In Append HDFS-200 issue, If file has 10 blocks and after writing 5 blocks if client invokes sync method then NN will persist the blocks information in edits. After this if we restart the NN, All the DataNodes will reregister with NN. But DataNodes are not sending the blocks being written information to NN. DNs are sending the blocksBeingWritten information in DN startup. So, here NameNode can not find that the 5 persisted blocks belongs to which datanodes. This information can build based on block reports from DN. Otherwise we will loose this 5 blocks information even NN persisted that block information in edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place
[ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096160#comment-13096160 ] Jitendra Nath Pandey commented on HDFS-2018: Todd, are you ok with committing this now? The patch is inline with the what we had agreed before. 1073: Move all journal stream management code into one place Key: HDFS-2018 URL: https://issues.apache.org/jira/browse/HDFS-2018 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 0.23.0 Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, hdfs-2018-otherapi.txt, hdfs-2018.txt Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager and the code for input streams is in the inspectors. This change does a number of things. - Input and Output streams are now created by the JournalManager. - FSImageStorageInspectors now deals with URIs when referring to edit logs - Recovery of inprogress logs is performed by counting the number of transactions instead of looking at the length of the file. The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2232) TestHDFSCLI fails on 0.22 branch
[ https://issues.apache.org/jira/browse/HDFS-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-2232: --- Attachment: HDFS-2232.patch Patch for trunk. TestHDFSCLI fails on 0.22 branch Key: HDFS-2232 URL: https://issues.apache.org/jira/browse/HDFS-2232 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Priority: Blocker Fix For: 0.22.0 Attachments: HDFS-2232.patch, HDFS-2232.patch, HDFS-2232.patch, HDFS-2232.patch, TEST-org.apache.hadoop.cli.TestHDFSCLI.txt, TEST-org.apache.hadoop.cli.TestHDFSCLI.txt Several HDFS CLI tests fail on 0.22 branch. I can see 2 reasons: # Not generic enough regular expression for host names and paths. Similar to MAPREDUCE-2304. # Some command outputs have new-line in the end. # And some seem to produce [much] more output than expected. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2281) NPE in checkpoint during processIOError()
[ https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2281: -- Status: Open (was: Patch Available) NPE in checkpoint during processIOError() - Key: HDFS-2281 URL: https://issues.apache.org/jira/browse/HDFS-2281 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Uma Maheswara Rao G Fix For: 0.22.0 Attachments: BN-bug-NPE.txt, HDFS-2281.patch At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls revertFileStreams(). The latter closes each file stream, and tries to rename the corresponding file to its permanent location current/edits. If for any reason the rename fails processIOError() is called for failed streams. processIOError() will try to close the stream again and will get NPE in EditLogFileOutputStream.close() because bufCurrent was set to null by the previous close. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2284) RW Http access to HDFS
[ https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096166#comment-13096166 ] Allen Wittenauer commented on HDFS-2284: webhdfs:// (just like webnfs://) RW Http access to HDFS -- Key: HDFS-2284 URL: https://issues.apache.org/jira/browse/HDFS-2284 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Assignee: Tsz Wo (Nicholas), SZE Attachments: h2284_20110902b.patch HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2281) NPE in checkpoint during processIOError()
[ https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2281: -- Attachment: HDFS-2281.1.patch NPE in checkpoint during processIOError() - Key: HDFS-2281 URL: https://issues.apache.org/jira/browse/HDFS-2281 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Uma Maheswara Rao G Fix For: 0.22.0 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls revertFileStreams(). The latter closes each file stream, and tries to rename the corresponding file to its permanent location current/edits. If for any reason the rename fails processIOError() is called for failed streams. processIOError() will try to close the stream again and will get NPE in EditLogFileOutputStream.close() because bufCurrent was set to null by the previous close. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2281) NPE in checkpoint during processIOError()
[ https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2281: -- Status: Patch Available (was: Open) Hi Konstantin, Thanks a lot for taking a look. Actually this has been re-factored in trunk. So, as part of porting i included them. Actually we can make use of processIOError only. So, Used processIOError for handling the failed storage directories. Konstantin, can you take a look. Test-patch results: [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. == Finished == Thanks Uma NPE in checkpoint during processIOError() - Key: HDFS-2281 URL: https://issues.apache.org/jira/browse/HDFS-2281 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Uma Maheswara Rao G Fix For: 0.22.0 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls revertFileStreams(). The latter closes each file stream, and tries to rename the corresponding file to its permanent location current/edits. If for any reason the rename fails processIOError() is called for failed streams. processIOError() will try to close the stream again and will get NPE in EditLogFileOutputStream.close() because bufCurrent was set to null by the previous close. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place
[ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096175#comment-13096175 ] Todd Lipcon commented on HDFS-2018: --- OK. I still think it's a worse API than the other patch I had attached a while back, for the reasons mentioned above, but I won't block it. Go ahead and commit. 1073: Move all journal stream management code into one place Key: HDFS-2018 URL: https://issues.apache.org/jira/browse/HDFS-2018 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 0.23.0 Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, hdfs-2018-otherapi.txt, hdfs-2018.txt Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager and the code for input streams is in the inspectors. This change does a number of things. - Input and Output streams are now created by the JournalManager. - FSImageStorageInspectors now deals with URIs when referring to edit logs - Recovery of inprogress logs is performed by counting the number of transactions instead of looking at the length of the file. The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2281) NPE in checkpoint during processIOError()
[ https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096177#comment-13096177 ] Hadoop QA commented on HDFS-2281: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492780/HDFS-2281.1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1190//console This message is automatically generated. NPE in checkpoint during processIOError() - Key: HDFS-2281 URL: https://issues.apache.org/jira/browse/HDFS-2281 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Uma Maheswara Rao G Fix For: 0.22.0 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls revertFileStreams(). The latter closes each file stream, and tries to rename the corresponding file to its permanent location current/edits. If for any reason the rename fails processIOError() is called for failed streams. processIOError() will try to close the stream again and will get NPE in EditLogFileOutputStream.close() because bufCurrent was set to null by the previous close. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2284) RW Http access to HDFS
[ https://issues.apache.org/jira/browse/HDFS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096182#comment-13096182 ] Alejandro Abdelnur commented on HDFS-2284: -- @Nicholas, I didn't see a follow up on my comment of Hoop being used as a next HFTP as well. RW Http access to HDFS -- Key: HDFS-2284 URL: https://issues.apache.org/jira/browse/HDFS-2284 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Assignee: Tsz Wo (Nicholas), SZE Attachments: h2284_20110902b.patch HFTP allows on read access to HDFS via HTTP. Add RW HTTP access to HDFS) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-362) FSEditLog should not writes long and short as UTF8 and should not use ArrayWritable for writing non-array items
[ https://issues.apache.org/jira/browse/HDFS-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-362: - Affects Version/s: 0.23.0 FSEditLog should not writes long and short as UTF8 and should not use ArrayWritable for writing non-array items --- Key: HDFS-362 URL: https://issues.apache.org/jira/browse/HDFS-362 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Uma Maheswara Rao G Attachments: HDFS-362.1.patch, HDFS-362.2.patch, HDFS-362.2b.patch, HDFS-362.2c.patch, HDFS-362.2d.patch, HDFS-362.2d.patch, HDFS-362.patch In FSEditLog, - long and short are first converted to String and are further converted to UTF8 - For some non-array items, it first create an ArrayWritable object to hold all the items and then writes the ArrayWritable object. These result creating many intermediate objects which affects Namenode CPU performance and Namenode restart. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-962) Make DFSOutputStream MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Joseph updated HDFS-962: --- Status: Open (was: Patch Available) Make DFSOutputStream MAX_PACKETS configurable - Key: HDFS-962 URL: https://issues.apache.org/jira/browse/HDFS-962 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Justin Joseph Priority: Minor Attachments: HDFS-962.1.patch, HDFS-962.patch HDFS-959 suggests that the MAX_PACKETS variable (which determines how many outstanding data packets the DFSOutputStream will permit) may have an impact on performance. If so, we should make it configurable to trade off between memory and performance. I think it ought to be a secret/undocumented config for now - this will make it easier to benchmark without confusing users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-962) Make DFSOutputStream MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Joseph updated HDFS-962: --- Attachment: HDFS-962.1.patch Make DFSOutputStream MAX_PACKETS configurable - Key: HDFS-962 URL: https://issues.apache.org/jira/browse/HDFS-962 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Justin Joseph Priority: Minor Attachments: HDFS-962.1.patch, HDFS-962.patch HDFS-959 suggests that the MAX_PACKETS variable (which determines how many outstanding data packets the DFSOutputStream will permit) may have an impact on performance. If so, we should make it configurable to trade off between memory and performance. I think it ought to be a secret/undocumented config for now - this will make it easier to benchmark without confusing users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-962) Make DFSOutputStream MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096201#comment-13096201 ] Justin Joseph commented on HDFS-962: Hi Nicholas, Thanks a lot for taking a look on this patch! I have addressed your comments. {quote} * change dfsMaxPackets to private and non-static. {quote} Done {quote} * define constants for dfs.max.packets in DFSConfigKeys {quote} Added in DFSConfigKeys. Thanks Justin Make DFSOutputStream MAX_PACKETS configurable - Key: HDFS-962 URL: https://issues.apache.org/jira/browse/HDFS-962 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Justin Joseph Priority: Minor Attachments: HDFS-962.1.patch, HDFS-962.patch HDFS-959 suggests that the MAX_PACKETS variable (which determines how many outstanding data packets the DFSOutputStream will permit) may have an impact on performance. If so, we should make it configurable to trade off between memory and performance. I think it ought to be a secret/undocumented config for now - this will make it easier to benchmark without confusing users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-200: -- Attachment: HDFS-200.20-security.1.patch Patch for 20-security branch. In HDFS, sync() not yet guarantees data available to the new readers Key: HDFS-200 URL: https://issues.apache.org/jira/browse/HDFS-200 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.20-append Reporter: Tsz Wo (Nicholas), SZE Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, namenode.log, namenode.log, reopen_test.sh In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file However, this feature is not yet implemented. Note that the operation 'flushed' is now called sync. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory
[ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-142: -- Attachment: HDFS-142.20-security.1.patch Patch for 20-security branch uploaded. In 0.20, move blocks being written into a blocksBeingWritten directory -- Key: HDFS-142 URL: https://issues.apache.org/jira/browse/HDFS-142 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Raghu Angadi Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, HDFS-142-multiple-blocks-datanode-exception.patch, HDFS-142.20-security.1.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt, hdfs-142-minidfs-fix-from-409.txt, hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, recover-rbw-v2.txt, testfileappend4-deaddn.txt, validateBlockMetaData-synchronized.txt Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work : - remove the tmp files during upgrade, or - if the files under /tmp are in pre-18 format (i.e. no generation), delete them. Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause incorrect over-replication at the namenode before that. Also it looks like our policy regd treating files under tmp needs to be defined better. Right now there are probably one or two more bugs with it. Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2232) TestHDFSCLI fails on 0.22 branch
[ https://issues.apache.org/jira/browse/HDFS-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096216#comment-13096216 ] Hadoop QA commented on HDFS-2232: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492779/HDFS-2232.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 131 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap org.apache.hadoop.hdfs.server.datanode.TestReplicasMap +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1189//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1189//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1189//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1189//console This message is automatically generated. TestHDFSCLI fails on 0.22 branch Key: HDFS-2232 URL: https://issues.apache.org/jira/browse/HDFS-2232 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Priority: Blocker Fix For: 0.22.0 Attachments: HDFS-2232.patch, HDFS-2232.patch, HDFS-2232.patch, HDFS-2232.patch, TEST-org.apache.hadoop.cli.TestHDFSCLI.txt, TEST-org.apache.hadoop.cli.TestHDFSCLI.txt Several HDFS CLI tests fail on 0.22 branch. I can see 2 reasons: # Not generic enough regular expression for host names and paths. Similar to MAPREDUCE-2304. # Some command outputs have new-line in the end. # And some seem to produce [much] more output than expected. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2189) guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096242#comment-13096242 ] Joep Rottinghuis commented on HDFS-2189: Integration build should be kicked because the last published POM still have the erroneous reference to org.apache.hadooip#guava. This is failing downstream builds. See HBASE-4327 guava-r09 dependency missing from ivy/hadoop-hdfs-template.xml in HDFS. - Key: HDFS-2189 URL: https://issues.apache.org/jira/browse/HDFS-2189 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Assignee: Plamen Jeliazkov Priority: Blocker Fix For: 0.22.0 Attachments: HDFS-2189-1.patch, patch.txt Corrected version of: https://issues.apache.org/jira/browse/MAPREDUCE-2627 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096247#comment-13096247 ] Suresh Srinivas commented on HDFS-200: -- +1 for the patch. In HDFS, sync() not yet guarantees data available to the new readers Key: HDFS-200 URL: https://issues.apache.org/jira/browse/HDFS-200 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.20-append Reporter: Tsz Wo (Nicholas), SZE Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, namenode.log, namenode.log, reopen_test.sh In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file However, this feature is not yet implemented. Note that the operation 'flushed' is now called sync. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-200: - Fix Version/s: 0.20.205.0 In HDFS, sync() not yet guarantees data available to the new readers Key: HDFS-200 URL: https://issues.apache.org/jira/browse/HDFS-200 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.20-append Reporter: Tsz Wo (Nicholas), SZE Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append, 0.20.205.0 Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, namenode.log, namenode.log, reopen_test.sh In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file However, this feature is not yet implemented. Note that the operation 'flushed' is now called sync. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096250#comment-13096250 ] Suresh Srinivas commented on HDFS-200: -- I committed this change to 0.20-security. In HDFS, sync() not yet guarantees data available to the new readers Key: HDFS-200 URL: https://issues.apache.org/jira/browse/HDFS-200 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.20-append Reporter: Tsz Wo (Nicholas), SZE Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append, 0.20.205.0 Attachments: 4379_20081010TC3.java, HDFS-200.20-security.1.patch, Reader.java, Reader.java, ReopenProblem.java, Writer.java, Writer.java, checkLeases-fix-1.txt, checkLeases-fix-unit-test-1.txt, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders15_20.txt, fsyncConcurrentReaders16_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, namenode.log, namenode.log, reopen_test.sh In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file However, this feature is not yet implemented. Note that the operation 'flushed' is now called sync. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-988) saveNamespace race can corrupt the edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-988: - Fix Version/s: 0.20.205.0 saveNamespace race can corrupt the edits log Key: HDFS-988 URL: https://issues.apache.org/jira/browse/HDFS-988 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Eli Collins Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20. https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-826: -- Attachment: HDFS-826.20-security.1.patch Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline Key: HDFS-826 URL: https://issues.apache.org/jira/browse/HDFS-826 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.20-append Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.20-append, 0.21.0 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, ReplicableHdfs2.txt, ReplicableHdfs3.txt HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event. In our use-case, the application can close the file on the first datanode death, and start writing to a newly created file. This ensures that the reliability guarantee of a block is close to 3 at all time. One idea is to make DFSOutoutStream. write() throw an exception if the number of datanodes in the write pipeline fall below minimum.replication.factor that is set on the client (this is backward compatible). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096253#comment-13096253 ] Jitendra Nath Pandey commented on HDFS-826: --- Uploaded patch for 20-security branch. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline Key: HDFS-826 URL: https://issues.apache.org/jira/browse/HDFS-826 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.20-append Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.20-append, 0.21.0 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, ReplicableHdfs2.txt, ReplicableHdfs3.txt HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event. In our use-case, the application can close the file on the first datanode death, and start writing to a newly created file. This ensures that the reliability guarantee of a block is close to 3 at all time. One idea is to make DFSOutoutStream. write() throw an exception if the number of datanodes in the write pipeline fall below minimum.replication.factor that is set on the client (this is backward compatible). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace race can corrupt the edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096254#comment-13096254 ] Suresh Srinivas commented on HDFS-988: -- +1 for the 20-security patch. saveNamespace race can corrupt the edits log Key: HDFS-988 URL: https://issues.apache.org/jira/browse/HDFS-988 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Eli Collins Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20. https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
[ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-630: -- Attachment: HDFS-630.20-security.1.patch Patch for 20-security branch uploaded. In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block. --- Key: HDFS-630 URL: https://issues.apache.org/jira/browse/HDFS-630 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.20-append Reporter: Ruyue Ma Assignee: Cosmin Lehene Fix For: 0.20-append, 0.21.0 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, HDFS-630.20-security.1.patch, HDFS-630.patch, hdfs-630-0.20-append.patch, hdfs-630-0.20.txt created from hdfs-200. If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream). This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-datanode and the above logic bails out. Our solution: when getting block location from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one block allocation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace race can corrupt the edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096261#comment-13096261 ] Suresh Srinivas commented on HDFS-988: -- I committed the patch to 0.20-security branch. saveNamespace race can corrupt the edits log Key: HDFS-988 URL: https://issues.apache.org/jira/browse/HDFS-988 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Eli Collins Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20. https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace race can corrupt the edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096262#comment-13096262 ] Suresh Srinivas commented on HDFS-988: -- +1 for the patch. saveNamespace race can corrupt the edits log Key: HDFS-988 URL: https://issues.apache.org/jira/browse/HDFS-988 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Eli Collins Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: 988-fixups.txt, HDFS-988.20-security.patch, HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-6.patch, hdfs-988-7.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20. https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1054) Remove unnecessary sleep after failure in nextBlockOutputStream
[ https://issues.apache.org/jira/browse/HDFS-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1054: --- Attachment: HDFS-1054.20-security.1.patch Patch for 20-security branch. Remove unnecessary sleep after failure in nextBlockOutputStream --- Key: HDFS-1054 URL: https://issues.apache.org/jira/browse/HDFS-1054 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.20.3, 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append, 0.21.0 Attachments: HDFS-1054.20-security.1.patch, hdfs-1054-0.20-append.txt, hdfs-1054.txt, hdfs-1054.txt If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds before retrying. I don't see a great reason to wait at all, much less 6 seconds (especially now that HDFS-630 ensures that a retry won't go back to the bad node). We should at least make it configurable, and perhaps something like backoff makes some sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-826: - Fix Version/s: 0.20.205.0 Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline Key: HDFS-826 URL: https://issues.apache.org/jira/browse/HDFS-826 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.20-append Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.20-append, 0.20.205.0, 0.21.0 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, ReplicableHdfs2.txt, ReplicableHdfs3.txt HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event. In our use-case, the application can close the file on the first datanode death, and start writing to a newly created file. This ensures that the reliability guarantee of a block is close to 3 at all time. One idea is to make DFSOutoutStream. write() throw an exception if the number of datanodes in the write pipeline fall below minimum.replication.factor that is set on the client (this is backward compatible). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096267#comment-13096267 ] Suresh Srinivas commented on HDFS-826: -- +1 for the patch. I committed the patch to 0.20-security branch. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline Key: HDFS-826 URL: https://issues.apache.org/jira/browse/HDFS-826 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.20-append Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.20-append, 0.20.205.0, 0.21.0 Attachments: HDFS-826-0.20-v2.patch, HDFS-826-0.20.patch, HDFS-826.20-security.1.patch, Replicable4.txt, ReplicableHdfs.txt, ReplicableHdfs2.txt, ReplicableHdfs3.txt HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event. In our use-case, the application can close the file on the first datanode death, and start writing to a newly created file. This ensures that the reliability guarantee of a block is close to 3 at all time. One idea is to make DFSOutoutStream. write() throw an exception if the number of datanodes in the write pipeline fall below minimum.replication.factor that is set on the client (this is backward compatible). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1141) completeFile does not check lease ownership
[ https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1141: --- Attachment: HDFS-1141.20-security.1.patch Patch for 20-security uploaded. completeFile does not check lease ownership --- Key: HDFS-1141 URL: https://issues.apache.org/jira/browse/HDFS-1141 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.20-append, 0.22.0 Attachments: HDFS-1141.20-security.1.patch, hdfs-1141-branch20.txt, hdfs-1141.txt, hdfs-1141.txt completeFile should check that the caller still owns the lease of the file that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' case in HDFS-1139. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory
[ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096275#comment-13096275 ] Suresh Srinivas commented on HDFS-142: -- Can you please add a banner to TestFileAppend4.java In 0.20, move blocks being written into a blocksBeingWritten directory -- Key: HDFS-142 URL: https://issues.apache.org/jira/browse/HDFS-142 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Raghu Angadi Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, HDFS-142-multiple-blocks-datanode-exception.patch, HDFS-142.20-security.1.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt, hdfs-142-minidfs-fix-from-409.txt, hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, recover-rbw-v2.txt, testfileappend4-deaddn.txt, validateBlockMetaData-synchronized.txt Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work : - remove the tmp files during upgrade, or - if the files under /tmp are in pre-18 format (i.e. no generation), delete them. Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause incorrect over-replication at the namenode before that. Also it looks like our policy regd treating files under tmp needs to be defined better. Right now there are probably one or two more bugs with it. Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory
[ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-142: - Fix Version/s: 0.20.205.0 In 0.20, move blocks being written into a blocksBeingWritten directory -- Key: HDFS-142 URL: https://issues.apache.org/jira/browse/HDFS-142 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Raghu Angadi Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, HDFS-142-multiple-blocks-datanode-exception.patch, HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt, hdfs-142-minidfs-fix-from-409.txt, hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, recover-rbw-v2.txt, testfileappend4-deaddn.txt, validateBlockMetaData-synchronized.txt Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work : - remove the tmp files during upgrade, or - if the files under /tmp are in pre-18 format (i.e. no generation), delete them. Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause incorrect over-replication at the namenode before that. Also it looks like our policy regd treating files under tmp needs to be defined better. Right now there are probably one or two more bugs with it. Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory
[ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-142: -- Attachment: HDFS-142.20-security.2.patch Added Apache License header. In 0.20, move blocks being written into a blocksBeingWritten directory -- Key: HDFS-142 URL: https://issues.apache.org/jira/browse/HDFS-142 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Raghu Angadi Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, HDFS-142-multiple-blocks-datanode-exception.patch, HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt, hdfs-142-minidfs-fix-from-409.txt, hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, recover-rbw-v2.txt, testfileappend4-deaddn.txt, validateBlockMetaData-synchronized.txt Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work : - remove the tmp files during upgrade, or - if the files under /tmp are in pre-18 format (i.e. no generation), delete them. Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause incorrect over-replication at the namenode before that. Also it looks like our policy regd treating files under tmp needs to be defined better. Right now there are probably one or two more bugs with it. Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder
[ https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1204: --- Attachment: HDFS-1204.20-security.1.patch Patch for 20-security. 0.20: Lease expiration should recover single files, not entire lease holder --- Key: HDFS-1204 URL: https://issues.apache.org/jira/browse/HDFS-1204 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: sam rash Fix For: 0.20-append Attachments: HDFS-1204.20-security.1.patch, hdfs-1204.txt, hdfs-1204.txt This was brought up in HDFS-200 but didn't make it into the branch on Apache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1057: --- Attachment: HDFS-1057.20-security.1.patch Patch for 20-security branch uploaded. Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append, 0.21.0, 0.22.0 Attachments: HDFS-1057-0.20-append.patch, HDFS-1057.20-security.1.patch, conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.
[ https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-724: -- Attachment: HDFS-724.20-security.1.patch Patch for 20-security uploaded. Pipeline close hangs if one of the datanode is not responsive. -- Key: HDFS-724 URL: https://issues.apache.org/jira/browse/HDFS-724 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.21.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Hairong Kuang Priority: Blocker Fix For: 0.20-append, 0.21.0 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, pipelineHeartbeat2.patch, stuckWriteAppend20.patch In the new pipeline design, pipeline close is implemented by sending an additional empty packet. If one of the datanode does not response to this empty packet, the pipeline hangs. It seems that there is no timeout. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory
[ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096328#comment-13096328 ] Suresh Srinivas commented on HDFS-142: -- +1 for the patch. In 0.20, move blocks being written into a blocksBeingWritten directory -- Key: HDFS-142 URL: https://issues.apache.org/jira/browse/HDFS-142 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Raghu Angadi Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, HDFS-142-multiple-blocks-datanode-exception.patch, HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt, hdfs-142-minidfs-fix-from-409.txt, hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, recover-rbw-v2.txt, testfileappend4-deaddn.txt, validateBlockMetaData-synchronized.txt Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work : - remove the tmp files during upgrade, or - if the files under /tmp are in pre-18 format (i.e. no generation), delete them. Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause incorrect over-replication at the namenode before that. Also it looks like our policy regd treating files under tmp needs to be defined better. Right now there are probably one or two more bugs with it. Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory
[ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096333#comment-13096333 ] Suresh Srinivas commented on HDFS-142: -- I committed the patch 0.20-security In 0.20, move blocks being written into a blocksBeingWritten directory -- Key: HDFS-142 URL: https://issues.apache.org/jira/browse/HDFS-142 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Raghu Angadi Assignee: dhruba borthakur Priority: Blocker Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, HDFS-142-multiple-blocks-datanode-exception.patch, HDFS-142.20-security.1.patch, HDFS-142.20-security.2.patch, HDFS-142_20-append2.patch, HDFS-142_20.patch, appendFile-recheck-lease.txt, appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, dont-recover-rwr-when-rbw-available.txt, handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt, hdfs-142-minidfs-fix-from-409.txt, hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, hdfs-142-testleaserecovery-fix.txt, recentInvalidateSets-assertion-fix.txt, recover-rbw-v2.txt, testfileappend4-deaddn.txt, validateBlockMetaData-synchronized.txt Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work : - remove the tmp files during upgrade, or - if the files under /tmp are in pre-18 format (i.e. no generation), delete them. Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause incorrect over-replication at the namenode before that. Also it looks like our policy regd treating files under tmp needs to be defined better. Right now there are probably one or two more bugs with it. Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file
[ https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-895: -- Attachment: HDFS-895.20-security.1.patch Patch uploaded for 20-security. Allow hflush/sync to occur in parallel with new writes to the file -- Key: HDFS-895 URL: https://issues.apache.org/jira/browse/HDFS-895 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: dhruba borthakur Assignee: Todd Lipcon Fix For: 0.20-append, 0.22.0 Attachments: 895-delta-for-review.txt, HDFS-895.20-security.1.patch, hdfs-895-0.20-append.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt, hdfs-895-branch-20-append.txt, hdfs-895-ontopof-1497.txt, hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized. This means that if a hflush/sync is in progress, an applicationn cannot write data to the HDFS client buffer. This reduces the write throughput of the transaction log in HBase. The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync is in progress. It can record the seqno of the message for which it should receice the ack, indicate to the DataStream thread to star flushing those messages, exit the synchronized section and just wai for that ack to arrive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2288) Replicas awaiting recovery should return a full visible length
[ https://issues.apache.org/jira/browse/HDFS-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096336#comment-13096336 ] Todd Lipcon commented on HDFS-2288: --- Nicholas: given the above, do you think this patch is correct? Replicas awaiting recovery should return a full visible length -- Key: HDFS-2288 URL: https://issues.apache.org/jira/browse/HDFS-2288 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.23.0 Attachments: hdfs-2288.txt Currently, if the client calls getReplicaVisibleLength for a RWR, it returns a visible length of 0. This causes one of HBase's tests to fail, and I believe it's incorrect behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1520) HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
[ https://issues.apache.org/jira/browse/HDFS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1520: --- Attachment: HDFS-1520.20-security.1.patch Patch for 20-security. HDFS 20 append: Lightweight NameNode operation to trigger lease recovery Key: HDFS-1520 URL: https://issues.apache.org/jira/browse/HDFS-1520 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: HDFS-1520.20-security.1.patch, recoverLeaseApache20.patch Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1555) HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered
[ https://issues.apache.org/jira/browse/HDFS-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1555: --- Attachment: HDFS-1555.20-security.1.patch Patch for 20-security. HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered - Key: HDFS-1555 URL: https://issues.apache.org/jira/browse/HDFS-1555 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: HDFS-1555.20-security.1.patch, appendRecoveryRace.patch, recoveryRace.patch When a file is under lease recovery and the writer is still alive, the write pipeline will be killed and then the writer will start a pipeline recovery. Sometimes the pipeline recovery may race before the lease recovery and as a result fail the lease recovery. This is very bad if we want to support the strong recoverLease semantics in HDFS-1554. So it would be nice if we could disallow a file's pipeline recovery while its lease recovery is in progress. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
[ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-630: - Fix Version/s: 0.20.205.0 I committed the patch to 0.20-security branch. In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block. --- Key: HDFS-630 URL: https://issues.apache.org/jira/browse/HDFS-630 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.20-append Reporter: Ruyue Ma Assignee: Cosmin Lehene Fix For: 0.20-append, 0.20.205.0, 0.21.0 Attachments: 0001-Fix-HDFS-630-0.21-svn-1.patch, 0001-Fix-HDFS-630-0.21-svn-2.patch, 0001-Fix-HDFS-630-0.21-svn.patch, 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 0001-Fix-HDFS-630-trunk-svn-2.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-3.patch, 0001-Fix-HDFS-630-trunk-svn-4.patch, HDFS-630.20-security.1.patch, HDFS-630.patch, hdfs-630-0.20-append.patch, hdfs-630-0.20.txt created from hdfs-200. If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream). This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-datanode and the above logic bails out. Our solution: when getting block location from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one block allocation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1554) Append 0.20: New semantics for recoverLease
[ https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1554: --- Attachment: HDFS-1554.20-security.1.patch Patch for 20-security. Append 0.20: New semantics for recoverLease --- Key: HDFS-1554 URL: https://issues.apache.org/jira/browse/HDFS-1554 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: HDFS-1554.20-security.1.patch, appendRecoverLease.patch, appendRecoverLease1.patch Current recoverLease API implemented in append 0.20 aims to provide a lighter weight (comparing to using create/append) way to trigger a file's soft lease expiration. From both the use case of hbase and scribe, it could have a stronger semantics: revoking the file's lease, thus starting lease recovery immediately. Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since HBase is moving to HDFS 0.22. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1054) Remove unnecessary sleep after failure in nextBlockOutputStream
[ https://issues.apache.org/jira/browse/HDFS-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1054: -- Fix Version/s: 0.20.205.0 +1 for the patch. I committed the patch to 0.20-security branch. Remove unnecessary sleep after failure in nextBlockOutputStream --- Key: HDFS-1054 URL: https://issues.apache.org/jira/browse/HDFS-1054 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.20.3, 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append, 0.20.205.0, 0.21.0 Attachments: HDFS-1054.20-security.1.patch, hdfs-1054-0.20-append.txt, hdfs-1054.txt, hdfs-1054.txt If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds before retrying. I don't see a great reason to wait at all, much less 6 seconds (especially now that HDFS-630 ensures that a retry won't go back to the bad node). We should at least make it configurable, and perhaps something like backoff makes some sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1207) 0.20-append: stallReplicationWork should be volatile
[ https://issues.apache.org/jira/browse/HDFS-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1207: -- Fix Version/s: 0.20.205.0 I applied the attached patch to 0.20-security. 0.20-append: stallReplicationWork should be volatile Key: HDFS-1207 URL: https://issues.apache.org/jira/browse/HDFS-1207 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append, 0.20.205.0 Attachments: hdfs-1207.txt the stallReplicationWork member in FSNamesystem is accessed by multiple threads without synchronization, but isn't marked volatile. I believe this is responsible for about 1% failure rate on TestFileAppend4.testAppendSyncChecksum* on my 8-core test boxes (looking at logs I see replication happening even though we've supposedly disabled it) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1141) completeFile does not check lease ownership
[ https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1141: -- Fix Version/s: 0.20.205.0 +1 for the patch. I committed it to 0.20-security branch. completeFile does not check lease ownership --- Key: HDFS-1141 URL: https://issues.apache.org/jira/browse/HDFS-1141 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: HDFS-1141.20-security.1.patch, hdfs-1141-branch20.txt, hdfs-1141.txt, hdfs-1141.txt completeFile should check that the caller still owns the lease of the file that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' case in HDFS-1139. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder
[ https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1204: -- Fix Version/s: 0.20.205.0 +1 for the patch. I committed it to 0.20-security branch. 0.20: Lease expiration should recover single files, not entire lease holder --- Key: HDFS-1204 URL: https://issues.apache.org/jira/browse/HDFS-1204 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: sam rash Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-1204.20-security.1.patch, hdfs-1204.txt, hdfs-1204.txt This was brought up in HDFS-200 but didn't make it into the branch on Apache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1118) DFSOutputStream socket leak when cannot connect to DataNode
[ https://issues.apache.org/jira/browse/HDFS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096362#comment-13096362 ] Suresh Srinivas commented on HDFS-1118: --- I have committed the patch to 0.20-security branch. DFSOutputStream socket leak when cannot connect to DataNode --- Key: HDFS-1118 URL: https://issues.apache.org/jira/browse/HDFS-1118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.1, 0.20.2, 0.20-append, 0.21.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: HDFS-1118.1.patch, HDFS-1118.2.patch, hdfs-1118.20s.patch, trunkPatch.txt The offending code is in {{DFSOutputStream.nextBlockOutputStream}} This function retries several times to call {{createBlockOutputStream}}. Each time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}. That object is never closed, but overwritten the next time {{createBlockOutputStream}} is called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1202) DataBlockScanner throws NPE when updated before initialized
[ https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096364#comment-13096364 ] Suresh Srinivas commented on HDFS-1202: --- I committed the patch to 0.20-security branch. DataBlockScanner throws NPE when updated before initialized --- Key: HDFS-1202 URL: https://issues.apache.org/jira/browse/HDFS-1202 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20-append, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: hdfs-1202-0.20-append.txt, hdfs-1202.20s.patch, hdfs-1202.txt Missing an isInitialized() check in updateScanStatusInternal -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1346: --- Attachment: HDFS-1346.20-security.1.patch Patch for 20-security, ported from 20-append. DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: HDFS-1346.20-security.1.patch, blockrecv-diff.txt, outOfOrder.patch When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (HDFS-1729) Improve metrics for measuring NN startup costs.
[ https://issues.apache.org/jira/browse/HDFS-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley closed HDFS-1729. --- Improve metrics for measuring NN startup costs. --- Key: HDFS-1729 URL: https://issues.apache.org/jira/browse/HDFS-1729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Owen O'Malley Assignee: Matt Foley Fix For: 0.20.203.0 Current logging and metrics are insufficient to diagnose latency problems in cluster startup. Add: 1. better logs in both Datanode and Namenode for Initial Block Report processing, to help distinguish between block report processing problems and RPC/queuing problems; 2. new logs to measure cost of scanning all blocks for over/under/invalid replicas, which occurs in Namenode just before exiting safe mode; 3. new logs to measure cost of processing the under/invalid replica queues (created by the above mentioned scan), which occurs just after exiting safe mode, and is said to take 100% of CPU. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.
[ https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096449#comment-13096449 ] Todd Lipcon commented on HDFS-1779: --- Mostly looks great. Small nits: - indentation in SimulatedFSDataset - there are some hard tabs in rejectAddStoredBlock - typo: 'unregisterted' Hairong, do you have a unit test? I have half of one here, similar to Uma's. After NameNode restart , Clients can not read partial files even after client invokes Sync. --- Key: HDFS-1779 URL: https://issues.apache.org/jira/browse/HDFS-1779 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.20-append Environment: Linux Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 0.20-append Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch In Append HDFS-200 issue, If file has 10 blocks and after writing 5 blocks if client invokes sync method then NN will persist the blocks information in edits. After this if we restart the NN, All the DataNodes will reregister with NN. But DataNodes are not sending the blocks being written information to NN. DNs are sending the blocksBeingWritten information in DN startup. So, here NameNode can not find that the 5 persisted blocks belongs to which datanodes. This information can build based on block reports from DN. Otherwise we will loose this 5 blocks information even NN persisted that block information in edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1346: -- Fix Version/s: 0.20.205.0 +1 for the patch. I committed it to 0.20-security branch. DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-1346.20-security.1.patch, blockrecv-diff.txt, outOfOrder.patch When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.
[ https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096454#comment-13096454 ] Suresh Srinivas commented on HDFS-724: -- This patch does not compile for me. Pipeline close hangs if one of the datanode is not responsive. -- Key: HDFS-724 URL: https://issues.apache.org/jira/browse/HDFS-724 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.21.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Hairong Kuang Priority: Blocker Fix For: 0.20-append, 0.21.0 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, pipelineHeartbeat2.patch, stuckWriteAppend20.patch In the new pipeline design, pipeline close is implemented by sending an additional empty packet. If one of the datanode does not response to this empty packet, the pipeline hangs. It seems that there is no timeout. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1161) Make DN minimum valid volumes configurable
[ https://issues.apache.org/jira/browse/HDFS-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096475#comment-13096475 ] Koji Noguchi commented on HDFS-1161: bq. IIRC Koji's perspective was that an admin doesn't want to specify the count or percent of valid volumes What I wanted was to keep the default behavior of shutting down the datanode when hitting faulty volume for 0.21 since we were seeing missing blocks after HDFS-457. I don't have preference between %disks and #disks. Make DN minimum valid volumes configurable -- Key: HDFS-1161 URL: https://issues.apache.org/jira/browse/HDFS-1161 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Blocker Fix For: 0.21.0 Attachments: HDFS-1161-y20.patch, hdfs-1161-1.patch, hdfs-1161-2.patch, hdfs-1161-3.patch, hdfs-1161-4.patch, hdfs-1161-5.patch, hdfs-1161-6.patch The minimum number of non-faulty volumes to keep the DN active is hard-coded to 1. It would be useful to allow users to configure this value so the DN can be taken offline when eg half of its disks fail, otherwise it doesn't get reported until it's down to it's final disk and suffering degraded performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1779) After NameNode restart , Clients can not read partial files even after client invokes Sync.
[ https://issues.apache.org/jira/browse/HDFS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-1779: --- Fix Version/s: 0.20.205.0 Maybe we need to include this for the upcoming 0.20.205+ release that has append support. After NameNode restart , Clients can not read partial files even after client invokes Sync. --- Key: HDFS-1779 URL: https://issues.apache.org/jira/browse/HDFS-1779 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.20-append Environment: Linux Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-1779.1.patch, HDFS-1779.patch, bbwReportAppend.patch In Append HDFS-200 issue, If file has 10 blocks and after writing 5 blocks if client invokes sync method then NN will persist the blocks information in edits. After this if we restart the NN, All the DataNodes will reregister with NN. But DataNodes are not sending the blocks being written information to NN. DNs are sending the blocksBeingWritten information in DN startup. So, here NameNode can not find that the 5 persisted blocks belongs to which datanodes. This information can build based on block reports from DN. Otherwise we will loose this 5 blocks information even NN persisted that block information in edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1057: -- Fix Version/s: 0.20.205.0 +1 for the patch. I committed it to 0.20-security. Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.21.0, 0.22.0 Attachments: HDFS-1057-0.20-append.patch, HDFS-1057.20-security.1.patch, conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2281) NPE in checkpoint during processIOError()
[ https://issues.apache.org/jira/browse/HDFS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096495#comment-13096495 ] Konstantin Shvachko commented on HDFS-2281: --- +1 I'll commit it to 0.22 branch. NPE in checkpoint during processIOError() - Key: HDFS-2281 URL: https://issues.apache.org/jira/browse/HDFS-2281 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Uma Maheswara Rao G Fix For: 0.22.0 Attachments: BN-bug-NPE.txt, HDFS-2281.1.patch, HDFS-2281.patch At the end of checkpoint BackupNode tries to convergeJournalSpool() and calls revertFileStreams(). The latter closes each file stream, and tries to rename the corresponding file to its permanent location current/edits. If for any reason the rename fails processIOError() is called for failed streams. processIOError() will try to close the stream again and will get NPE in EditLogFileOutputStream.close() because bufCurrent was set to null by the previous close. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.
[ https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096496#comment-13096496 ] Suresh Srinivas commented on HDFS-724: -- My bad, I had not applied HDFS-1057 patch which is required for this patch. +1 for the patch. Pipeline close hangs if one of the datanode is not responsive. -- Key: HDFS-724 URL: https://issues.apache.org/jira/browse/HDFS-724 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.21.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Hairong Kuang Priority: Blocker Fix For: 0.20-append, 0.21.0 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, pipelineHeartbeat2.patch, stuckWriteAppend20.patch In the new pipeline design, pipeline close is implemented by sending an additional empty packet. If one of the datanode does not response to this empty packet, the pipeline hangs. It seems that there is no timeout. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file
[ https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096499#comment-13096499 ] Suresh Srinivas commented on HDFS-895: -- +1 for the patch. Allow hflush/sync to occur in parallel with new writes to the file -- Key: HDFS-895 URL: https://issues.apache.org/jira/browse/HDFS-895 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: dhruba borthakur Assignee: Todd Lipcon Fix For: 0.20-append, 0.22.0 Attachments: 895-delta-for-review.txt, HDFS-895.20-security.1.patch, hdfs-895-0.20-append.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt, hdfs-895-branch-20-append.txt, hdfs-895-ontopof-1497.txt, hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized. This means that if a hflush/sync is in progress, an applicationn cannot write data to the HDFS client buffer. This reduces the write throughput of the transaction log in HBase. The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync is in progress. It can record the seqno of the message for which it should receice the ack, indicate to the DataStream thread to star flushing those messages, exit the synchronized section and just wai for that ack to arrive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.
[ https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-724: - Fix Version/s: 0.20.205.0 I committed the patch to 0.20-security branch. Pipeline close hangs if one of the datanode is not responsive. -- Key: HDFS-724 URL: https://issues.apache.org/jira/browse/HDFS-724 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.21.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Hairong Kuang Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.21.0 Attachments: HDFS-724.20-security.1.patch, h724_20091021.patch, hbAckReply.patch, pipelineHeartbeat.patch, pipelineHeartbeat1.patch, pipelineHeartbeat2.patch, stuckWriteAppend20.patch In the new pipeline design, pipeline close is implemented by sending an additional empty packet. If one of the datanode does not response to this empty packet, the pipeline hangs. It seems that there is no timeout. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file
[ https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-895: - Fix Version/s: 0.20.205.0 I committed the patch to 0.20-security. Allow hflush/sync to occur in parallel with new writes to the file -- Key: HDFS-895 URL: https://issues.apache.org/jira/browse/HDFS-895 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: dhruba borthakur Assignee: Todd Lipcon Fix For: 0.20-append, 0.20.205.0, 0.22.0 Attachments: 895-delta-for-review.txt, HDFS-895.20-security.1.patch, hdfs-895-0.20-append.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt, hdfs-895-branch-20-append.txt, hdfs-895-ontopof-1497.txt, hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized. This means that if a hflush/sync is in progress, an applicationn cannot write data to the HDFS client buffer. This reduces the write throughput of the transaction log in HBase. The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync is in progress. It can record the seqno of the message for which it should receice the ack, indicate to the DataStream thread to star flushing those messages, exit the synchronized section and just wai for that ack to arrive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1520) HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
[ https://issues.apache.org/jira/browse/HDFS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096504#comment-13096504 ] Suresh Srinivas commented on HDFS-1520: --- +1 for the 0.20-security patch. HDFS 20 append: Lightweight NameNode operation to trigger lease recovery Key: HDFS-1520 URL: https://issues.apache.org/jira/browse/HDFS-1520 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: HDFS-1520.20-security.1.patch, recoverLeaseApache20.patch Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1520) HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
[ https://issues.apache.org/jira/browse/HDFS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1520: -- Fix Version/s: 0.20.205.0 I committed the patch to 0.20-security branch. HDFS 20 append: Lightweight NameNode operation to trigger lease recovery Key: HDFS-1520 URL: https://issues.apache.org/jira/browse/HDFS-1520 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-1520.20-security.1.patch, recoverLeaseApache20.patch Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1555) HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered
[ https://issues.apache.org/jira/browse/HDFS-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1555: -- Fix Version/s: 0.20.205.0 +1 for the patch. I have committed it to 0.20-security branch. HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered - Key: HDFS-1555 URL: https://issues.apache.org/jira/browse/HDFS-1555 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-1555.20-security.1.patch, appendRecoveryRace.patch, recoveryRace.patch When a file is under lease recovery and the writer is still alive, the write pipeline will be killed and then the writer will start a pipeline recovery. Sometimes the pipeline recovery may race before the lease recovery and as a result fail the lease recovery. This is very bad if we want to support the strong recoverLease semantics in HDFS-1554. So it would be nice if we could disallow a file's pipeline recovery while its lease recovery is in progress. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1554) Append 0.20: New semantics for recoverLease
[ https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1554: -- Fix Version/s: 0.20.205.0 +1 for the patch. I committed it to 0.20-security. Append 0.20: New semantics for recoverLease --- Key: HDFS-1554 URL: https://issues.apache.org/jira/browse/HDFS-1554 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append, 0.20.205.0 Attachments: HDFS-1554.20-security.1.patch, appendRecoverLease.patch, appendRecoverLease1.patch Current recoverLease API implemented in append 0.20 aims to provide a lighter weight (comparing to using create/append) way to trigger a file's soft lease expiration. From both the use case of hbase and scribe, it could have a stronger semantics: revoking the file's lease, thus starting lease recovery immediately. Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since HBase is moving to HDFS 0.22. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira