[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739288#comment-13739288 ] Vinay commented on HDFS-4504: - bq. It seems to me like it would be better to call completeFile() or perhaps some new abortFile() RPC, which would first verify that the client name trying to abort the lease is the same as the current lease holder. This looks good. Seems this would take lot of code changes and also lot of cases to handle. But may be difficult to handle suppose you have two threads, T1 and T2. They both have a client name of C. case since client is same. DFSOutputStream#close doesn't always release resources (such as leases) --- Key: HDFS-4504 URL: https://issues.apache.org/jira/browse/HDFS-4504 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One example is if there is a pipeline error and then pipeline recovery fails. Unfortunately, in this case, some of the resources used by the {{DFSOutputStream}} are leaked. One particularly important resource is file leases. So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to a file, but then fail to close it. Unfortunately, the {{LeaseRenewerThread}} inside the client will continue to renew the lease for the undead file. Future attempts to close the file will just rethrow the previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739293#comment-13739293 ] Colin Patrick McCabe commented on HDFS-4504: I don't think adding a new RPC would be too bad. It would be very similar to recoverLease. bq. But may be difficult to handle suppose you have two threads, T1 and T2. They both have a client name of C. case since client is same. I think we should do this in HDFS-4688 rather than trying to solve it here. DFSOutputStream#close doesn't always release resources (such as leases) --- Key: HDFS-4504 URL: https://issues.apache.org/jira/browse/HDFS-4504 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One example is if there is a pipeline error and then pipeline recovery fails. Unfortunately, in this case, some of the resources used by the {{DFSOutputStream}} are leaked. One particularly important resource is file leases. So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to a file, but then fail to close it. Unfortunately, the {{LeaseRenewerThread}} inside the client will continue to renew the lease for the undead file. Future attempts to close the file will just rethrow the previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-3618: Attachment: HDFS-3618.patch Updated test SSH fencing option may incorrectly succeed if nc (netcat) command not present - Key: HDFS-3618 URL: https://issues.apache.org/jira/browse/HDFS-3618 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover Affects Versions: 2.0.0-alpha Reporter: Brahma Reddy Battula Assignee: Vinay Attachments: HDFS-3618.patch, HDFS-3618.patch, HDFS-3618.patch, zkfc_threaddump.out, zkfc.txt Started NN's and zkfc's in Suse11. Suse11 will have netcat installation and netcat -z will work(but nc -z wn't work).. While executing following command, got command not found hence rc will be other than zero and assuming that server was down..Here we are ending up without checking whether service is down or not.. {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z + serviceAddr.getHostName() + + serviceAddr.getPort()); if (rc == 0) { // the service is still listening - we are unable to fence LOG.warn(Unable to fence - it is running but we cannot kill it); return false; } else { LOG.info(Verified that the service is down.); return true; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739326#comment-13739326 ] Vinay commented on HDFS-3618: - Findbug and javadoc warnings are unrelated. I am seeing these on every patch submitted. may be some problem with QA? SSH fencing option may incorrectly succeed if nc (netcat) command not present - Key: HDFS-3618 URL: https://issues.apache.org/jira/browse/HDFS-3618 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover Affects Versions: 2.0.0-alpha Reporter: Brahma Reddy Battula Assignee: Vinay Attachments: HDFS-3618.patch, HDFS-3618.patch, HDFS-3618.patch, zkfc_threaddump.out, zkfc.txt Started NN's and zkfc's in Suse11. Suse11 will have netcat installation and netcat -z will work(but nc -z wn't work).. While executing following command, got command not found hence rc will be other than zero and assuming that server was down..Here we are ending up without checking whether service is down or not.. {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z + serviceAddr.getHostName() + + serviceAddr.getPort()); if (rc == 0) { // the service is still listening - we are unable to fence LOG.warn(Unable to fence - it is running but we cannot kill it); return false; } else { LOG.info(Verified that the service is down.); return true; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode
[ https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739343#comment-13739343 ] Colin Patrick McCabe commented on HDFS-5051: The random jitter code was taken from the block report code. The goal is the same-- to avoid overloading the NameNode with too many reports at the same time. I don't see any reason to take out the jitter code here, although it will not be as important as it was in the block report case. As far as I can tell, genstamp and block length should not be included in the cache report. They aren't included in the regular block report in StorageBlockReportProto. When asking a DataNode to lock a block, the NameNode can specify the genstamp and minimum length it wants at that time, and the DataNode can fail the request if it doesn't have that genstamp / length. This issue starts getting into the NN to DN communicaion (HDFS-5053). That's why I suggested discussing it there-- although I'm happy to discuss it here as well. Propagate cache status information from the DataNode to the NameNode Key: HDFS-5051 URL: https://issues.apache.org/jira/browse/HDFS-5051 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch The DataNode needs to inform the NameNode of its current cache state. Let's wire up the RPCs and stub out the relevant methods on the DN and NN side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739356#comment-13739356 ] Vinay commented on HDFS-2882: - bq. Did you reproduce the problem? If so, what were the steps to reproduce? Please check the test. I had just reproduced cases mentioned by Todd. bq. Also, your patch seems to make the DataNode loop endlessly trying to initialize any block pools that don't come up. I don't think that's what we want to do here. No. In case of multiple namenodes nameservice, if any one of the namenode is able to connect and BPOS is initialized, then only retry will be infinite for the other namenode. Retry to initialize BPOS will continue until both Namenodes failed to initialize else BPOS will exit. One more thing {{BPServiceActor#retrieveNamespaceInfo()}} is in inifinite loop, yes this can cause initialize to goto infinite loop, if namenode was down/not responding. But this is not changed in my patch. DN continues to start up, even if block pool fails to initialize Key: HDFS-2882 URL: https://issues.apache.org/jira/browse/HDFS-2882 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Attachments: HDFS-2882.patch, hdfs-2882.txt I started a DN on a machine that was completely out of space on one of its drives. I saw the following: 2012-02-02 09:56:50,499 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id DS-507718931-172.29.5.194-11072-12978 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021 java.io.IOException: Mkdirs failed to create /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp at org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335) but the DN continued to run, spewing NPEs when it tried to do block reports, etc. This was on the HDFS-1623 branch but may affect trunk as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5094) Add Metrics in DFSClient
LiuLei created HDFS-5094: Summary: Add Metrics in DFSClient Key: HDFS-5094 URL: https://issues.apache.org/jira/browse/HDFS-5094 Project: Hadoop HDFS Issue Type: Task Components: hdfs-client Affects Versions: 2.0.5-alpha Reporter: LiuLei Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5094) Add Metrics in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-5094: - Attachment: DFSCLientMetrics.patch Add Metrics in DFSClient Key: HDFS-5094 URL: https://issues.apache.org/jira/browse/HDFS-5094 Project: Hadoop HDFS Issue Type: Task Components: hdfs-client Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: DFSCLientMetrics.patch Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739367#comment-13739367 ] Tao Luo commented on HDFS-5079: --- Replacing NNHAStatusHeartbeat.State with HAServiceState Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo updated HDFS-5079: -- Attachment: HDFS-5079.patch Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo reassigned HDFS-5079: - Assignee: Tao Luo Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record
LiuLei created HDFS-5095: Summary: Using JournalNode IP as name of IPCLoggerChannel metrics record Key: HDFS-5095 URL: https://issues.apache.org/jira/browse/HDFS-5095 Project: Hadoop HDFS Issue Type: Task Components: qjm Affects Versions: 2.0.5-alpha Reporter: LiuLei I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics record name, so metrics record of all JournalNode are displayed together in ganglia. If every JournalNode display metrics record in difference name that is better. I think use JournalNode IP as the name is better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5094) Add Metrics in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-5094: - Attachment: IPCLoggerChannelMetrics.java.patch Add Metrics in DFSClient Key: HDFS-5094 URL: https://issues.apache.org/jira/browse/HDFS-5094 Project: Hadoop HDFS Issue Type: Task Components: hdfs-client Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: DFSCLientMetrics.patch, IPCLoggerChannelMetrics.java.patch Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5094) Add Metrics in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-5094: - Attachment: (was: IPCLoggerChannelMetrics.java.patch) Add Metrics in DFSClient Key: HDFS-5094 URL: https://issues.apache.org/jira/browse/HDFS-5094 Project: Hadoop HDFS Issue Type: Task Components: hdfs-client Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: DFSCLientMetrics.patch Needing to add some metrics in DFSCLient, that help HBase to monitor HDFS performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record
[ https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-5095: - Attachment: IPCLoggerChannelMetrics.java.patch Using JournalNode IP as name of IPCLoggerChannel metrics record --- Key: HDFS-5095 URL: https://issues.apache.org/jira/browse/HDFS-5095 Project: Hadoop HDFS Issue Type: Task Components: qjm Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: IPCLoggerChannelMetrics.java.patch I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics record name, so metrics record of all JournalNode are displayed together in ganglia. If every JournalNode display metrics record in difference name that is better. I think use JournalNode IP as the name is better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record
[ https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-5095: - Attachment: metrics.jpg Diagram of one JournalNode in ganglia Using JournalNode IP as name of IPCLoggerChannel metrics record --- Key: HDFS-5095 URL: https://issues.apache.org/jira/browse/HDFS-5095 Project: Hadoop HDFS Issue Type: Task Components: qjm Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics record name, so metrics record of all JournalNode are displayed together in ganglia. If every JournalNode display metrics record in difference name that is better. I think use JournalNode IP as the name is better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record
[ https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-5095: - Attachment: metrics.jpg Using JournalNode IP as name of IPCLoggerChannel metrics record --- Key: HDFS-5095 URL: https://issues.apache.org/jira/browse/HDFS-5095 Project: Hadoop HDFS Issue Type: Task Components: qjm Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics record name, so metrics record of all JournalNode are displayed together in ganglia. If every JournalNode display metrics record in difference name that is better. I think use JournalNode IP as the name is better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record
[ https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LiuLei updated HDFS-5095: - Attachment: (was: metrics.jpg) Using JournalNode IP as name of IPCLoggerChannel metrics record --- Key: HDFS-5095 URL: https://issues.apache.org/jira/browse/HDFS-5095 Project: Hadoop HDFS Issue Type: Task Components: qjm Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics record name, so metrics record of all JournalNode are displayed together in ganglia. If every JournalNode display metrics record in difference name that is better. I think use JournalNode IP as the name is better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739442#comment-13739442 ] Hadoop QA commented on HDFS-3618: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597900/HDFS-3618.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4820//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4820//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4820//console This message is automatically generated. SSH fencing option may incorrectly succeed if nc (netcat) command not present - Key: HDFS-3618 URL: https://issues.apache.org/jira/browse/HDFS-3618 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover Affects Versions: 2.0.0-alpha Reporter: Brahma Reddy Battula Assignee: Vinay Attachments: HDFS-3618.patch, HDFS-3618.patch, HDFS-3618.patch, zkfc_threaddump.out, zkfc.txt Started NN's and zkfc's in Suse11. Suse11 will have netcat installation and netcat -z will work(but nc -z wn't work).. While executing following command, got command not found hence rc will be other than zero and assuming that server was down..Here we are ending up without checking whether service is down or not.. {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z + serviceAddr.getHostName() + + serviceAddr.getPort()); if (rc == 0) { // the service is still listening - we are unable to fence LOG.warn(Unable to fence - it is running but we cannot kill it); return false; } else { LOG.info(Verified that the service is down.); return true; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2933) Datanode index page on debug port not useful
[ https://issues.apache.org/jira/browse/HDFS-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vivek Ganesan reassigned HDFS-2933: --- Assignee: Vivek Ganesan Datanode index page on debug port not useful Key: HDFS-2933 URL: https://issues.apache.org/jira/browse/HDFS-2933 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Philip Zeyliger Assignee: Vivek Ganesan Labels: newbie If you visit the root page of a datanode's web port, you get an index page with WEB-INF and robots.txt. More useful would be to include information about the datanode, like its version, and links to /browseDirectory, /jmx, /metrics, /conf, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA
[ https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739519#comment-13739519 ] Hudson commented on HDFS-5091: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #301 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/301/]) HDFS-5091. Support for spnego keytab separate from the JournalNode keytab for secure HA. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513700) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java Support for spnego keytab separate from the JournalNode keytab for secure HA Key: HDFS-5091 URL: https://issues.apache.org/jira/browse/HDFS-5091 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.1.1-beta Attachments: HDFS-5091.001.patch This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also use the web keytab file for SPNEGO filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA
[ https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739654#comment-13739654 ] Hudson commented on HDFS-5091: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1491 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1491/]) HDFS-5091. Support for spnego keytab separate from the JournalNode keytab for secure HA. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513700) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java Support for spnego keytab separate from the JournalNode keytab for secure HA Key: HDFS-5091 URL: https://issues.apache.org/jira/browse/HDFS-5091 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.1.1-beta Attachments: HDFS-5091.001.patch This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also use the web keytab file for SPNEGO filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA
[ https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739709#comment-13739709 ] Hudson commented on HDFS-5091: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1518 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1518/]) HDFS-5091. Support for spnego keytab separate from the JournalNode keytab for secure HA. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513700) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java Support for spnego keytab separate from the JournalNode keytab for secure HA Key: HDFS-5091 URL: https://issues.apache.org/jira/browse/HDFS-5091 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 2.1.1-beta Attachments: HDFS-5091.001.patch This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also use the web keytab file for SPNEGO filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode
[ https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739707#comment-13739707 ] Suresh Srinivas commented on HDFS-5051: --- bq. The random jitter code was taken from the block report code. The goal is the same-- to avoid overloading the NameNode with too many reports at the same time. I don't see any reason to take out the jitter code here, although it will not be as important as it was in the block report case. Quoting my own question: bq. When a datanode starts, do we expect any thing to be in the cache at all? Hence the question why is jitter code is important? bq. They aren't included in the regular block report in StorageBlockReportProto. That is not correct. Please see the code in BlockListAsLongs. We need to decide the following (and I do not think with the current summary in HDFS-5053, that is the right place): # Do we need to include generation stamp and length? My early thought is, it may not be necessary. Current code includes both generation stamp and length. # When there are no cache entries in the datanode, my preference is not to send a cache report at all, including the first time datanode starts up. I agree that we could have incremental cache report. Propagate cache status information from the DataNode to the NameNode Key: HDFS-5051 URL: https://issues.apache.org/jira/browse/HDFS-5051 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch The DataNode needs to inform the NameNode of its current cache state. Let's wire up the RPCs and stub out the relevant methods on the DN and NN side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-3755) Creating an already-open-for-write file with overwrite=true fails
[ https://issues.apache.org/jira/browse/HDFS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739199#comment-13739199 ] Suresh Srinivas edited comment on HDFS-3755 at 8/14/13 2:33 PM: Given a regression from branch-1 was fixed in this Jira, why is it incompatible? was (Author: sureshms): Given a regression from branch-1 was fixed in this Jira, why is it incompatible? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. Creating an already-open-for-write file with overwrite=true fails - Key: HDFS-3755 URL: https://issues.apache.org/jira/browse/HDFS-3755 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, 2.0.2-alpha Attachments: hdfs-3755.txt, hdfs-3755.txt If a file is already open for write by one client, and another client calls {{fs.create()}} with {{overwrite=true}}, the file should be deleted and the new file successfully created. Instead, it is currently throwing AlreadyBeingCreatedException. This is a regression since branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739734#comment-13739734 ] John George commented on HDFS-2832: --- To support Storage Types, each DataNode must be treated as a collection of storages. (excerpt from pdf) Consider a cluster with a set of DataNodes with high end hardware (eg: SSD), and another set of DataNodes with low end hardware (eg: HDD). Each datanode is homogenous by itself, but the cluster itself is heterogeneous. Can the user still specify storage preference using StorageType and get expected results? Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739738#comment-13739738 ] Vinay commented on HDFS-2882: - Hi [~tlipcon], could you take a look at the patch, as the patch is on top of your work. Thanks DN continues to start up, even if block pool fails to initialize Key: HDFS-2882 URL: https://issues.apache.org/jira/browse/HDFS-2882 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Attachments: HDFS-2882.patch, hdfs-2882.txt I started a DN on a machine that was completely out of space on one of its drives. I saw the following: 2012-02-02 09:56:50,499 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id DS-507718931-172.29.5.194-11072-12978 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021 java.io.IOException: Mkdirs failed to create /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp at org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.init(FSDataset.java:335) but the DN continued to run, spewing NPEs when it tried to do block reports, etc. This was on the HDFS-1623 branch but may affect trunk as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record
[ https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739744#comment-13739744 ] Suresh Srinivas commented on HDFS-5095: --- Before this change the metrics name is NameNode as you noted. With the patch the metrics name is IPCLoggerChannel-address-port and is collected at the namenode. Instead of this, we could use a name such NameNode-qjournal-Address-port? Also please add a unit test. Using JournalNode IP as name of IPCLoggerChannel metrics record --- Key: HDFS-5095 URL: https://issues.apache.org/jira/browse/HDFS-5095 Project: Hadoop HDFS Issue Type: Task Components: qjm Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics record name, so metrics record of all JournalNode are displayed together in ganglia. If every JournalNode display metrics record in difference name that is better. I think use JournalNode IP as the name is better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-5095) Using JournalNode IP as name of IPCLoggerChannel metrics record
[ https://issues.apache.org/jira/browse/HDFS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739744#comment-13739744 ] Suresh Srinivas edited comment on HDFS-5095 at 8/14/13 3:00 PM: Before this change the metrics name is NameNode as you noted. With the patch the metrics name is {noformat}IPCLoggerChannel-address-port{noformat} and is collected at the namenode. Instead of this, we could use a name that both indicates that this metrics is from namenode and is related to quorum journal with a name like {noformat}NameNode-qjournal-Address-port{noformat} Also please add a unit test. was (Author: sureshms): Before this change the metrics name is NameNode as you noted. With the patch the metrics name is IPCLoggerChannel-address-port and is collected at the namenode. Instead of this, we could use a name such NameNode-qjournal-Address-port? Also please add a unit test. Using JournalNode IP as name of IPCLoggerChannel metrics record --- Key: HDFS-5095 URL: https://issues.apache.org/jira/browse/HDFS-5095 Project: Hadoop HDFS Issue Type: Task Components: qjm Affects Versions: 2.0.5-alpha Reporter: LiuLei Attachments: IPCLoggerChannelMetrics.java.patch, metrics.jpg I use QJM for HA, IPCLoggerChannelMetrics class use NameNode as metrics record name, so metrics record of all JournalNode are displayed together in ganglia. If every JournalNode display metrics record in difference name that is better. I think use JournalNode IP as the name is better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID
[ https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739767#comment-13739767 ] Suresh Srinivas commented on HDFS-5076: --- Jing, some comments: # In journal status should we also return the address and port of the Journal node? # Javadoc says A string presenting status for each journal. Do we want another method which takes a journal ID/namespaceID for journal related to a specific namenode. Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID Key: HDFS-5076 URL: https://issues.apache.org/jira/browse/HDFS-5076 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, HDFS-5076.003.patch Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide servlets to enable querying these information through http, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4946) Allow preferLocalNode in BlockPlacementPolicyDefault to be configurable
[ https://issues.apache.org/jira/browse/HDFS-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739790#comment-13739790 ] Harsh J commented on HDFS-4946: --- bq. Allow preferLocalNode in BlockPlacementPolicyDefault to be disabled in configuration to prevent *a client* from writing the first replica of every block (i.e. the entire file) to the local DataNode. The description reads to prevent specific clients, but the config toggle would shut off all clients from writing locally, which may not be desirous. Ideally we would like a client sent hint that influences the selection. Allow preferLocalNode in BlockPlacementPolicyDefault to be configurable --- Key: HDFS-4946 URL: https://issues.apache.org/jira/browse/HDFS-4946 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: James Kinley Assignee: James Kinley Attachments: HDFS-4946-1.patch Allow preferLocalNode in BlockPlacementPolicyDefault to be disabled in configuration to prevent a client from writing the first replica of every block (i.e. the entire file) to the local DataNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo updated HDFS-5079: -- Status: Patch Available (was: Open) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-5079: - Hadoop Flags: Incompatible change Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739884#comment-13739884 ] Arpit Agarwal commented on HDFS-2832: - John, {quote}Can the user still specify storage preference using StorageType and get expected results? {quote} We don't make any assumptions about the cluster layout. The storages attached to a DataNode may be of the same of different types. Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services
[ https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739894#comment-13739894 ] Allen Wittenauer commented on HDFS-5087: Is there a reason this is preferred over just modifying hadoop-env.sh's service specific env opts? Allowing specific JAVA heap max setting for HDFS related services - Key: HDFS-5087 URL: https://issues.apache.org/jira/browse/HDFS-5087 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Kai Zheng Priority: Minor Attachments: HDFS-5087.patch This allows specific JAVA heap max setting for HDFS related services as it does for YARN services, to be consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739961#comment-13739961 ] Hadoop QA commented on HDFS-5079: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597906/HDFS-5079.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4821//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4821//console This message is automatically generated. Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
[ https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740054#comment-13740054 ] Tsz Wo (Nicholas), SZE commented on HDFS-4898: -- The failure of TestBlocksWithNotEnoughRacks is not related. It does not use BlockPlacementPolicyWithNodeGroup at all. BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack - Key: HDFS-4898 URL: https://issues.apache.org/jira/browse/HDFS-4898 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.0.4-alpha Reporter: Eric Sirianni Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h4898_20130809.patch As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not properly fallback to local rack when no nodes are available in remote racks, resulting in an improper {{NotEnoughReplicasException}}. {code:title=BlockPlacementPolicyWithNodeGroup.java} @Override protected void chooseRemoteRack(int numOfReplicas, DatanodeDescriptor localMachine, HashMapNode, Node excludedNodes, long blocksize, int maxReplicasPerRack, ListDatanodeDescriptor results, boolean avoidStaleNodes) throws NotEnoughReplicasException { int oldNumOfReplicas = results.size(); // randomly choose one node from remote racks try { chooseRandom( numOfReplicas, ~ + NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), excludedNodes, blocksize, maxReplicasPerRack, results, avoidStaleNodes); } catch (NotEnoughReplicasException e) { chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas), localMachine.getNetworkLocation(), excludedNodes, blocksize, maxReplicasPerRack, results, avoidStaleNodes); } } {code} As currently coded the {{chooseRandom()}} call in the {{catch}} block will never succeed as the set of nodes within the passed in node path (e.g. {{/rack1/nodegroup1}}) is entirely contained within the set of excluded nodes (both are the set of nodes within the same nodegroup as the node chosen first replica). The bug is that the fallback {{chooseRandom()}} call in the catch block should be passing in the _complement_ of the node path used in the initial {{chooseRandom()}} call in the try block (e.g. {{/rack1}}) - namely: {code} NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()) {code} This will yield the proper fallback behavior of choosing a random node from _within the same rack_, but still excluding those nodes _in the same nodegroup_ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services
[ https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740103#comment-13740103 ] Kai Zheng commented on HDFS-5087: - In hdfs script, with the following line {code} exec $JAVA -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS {code} If both JAVA_HEAP_MAX and service specific -Xmx (via relevant *_OPTS) are set, then which one will be used? JAVA_HEAP_MAX is always defined in hadoop-config.sh. Even JVM has clear definition about it, IMO it would be better to avoid it. The way used in the patch to resolve the conflict considered to be consistent with YARN related services. Allowing specific JAVA heap max setting for HDFS related services - Key: HDFS-5087 URL: https://issues.apache.org/jira/browse/HDFS-5087 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Kai Zheng Priority: Minor Attachments: HDFS-5087.patch This allows specific JAVA heap max setting for HDFS related services as it does for YARN services, to be consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740145#comment-13740145 ] Andrew Wang commented on HDFS-4949: --- Hi Arun, On the read path comments, it might be elucidating to check out the zero-copy read API that Colin's working on at HDFS-4953. The idea is that clients always use the zero copy cursor to do reads, which behind the scenes will do an mmap'd read if the block is cached, or a normal copying read if the block is on disk or remote. It allows an {{isCached}}-type check via not setting a fallback buffer for copying reads. This will cause the cursor to throw an exception on read if the block is not cached. Finally, there's also a parameter for enabling short reads, which comes into play when a read spans block files. On YARN integration, I'd like to revisit that a little ways down the road since we're focusing on getting a basic prototype out. If you want to get started on it now, it'd be helpful if you could review the current RM plan in the doc, and sketch out how a YARN-based architecture would look. Centralized cache management in HDFS Key: HDFS-4949 URL: https://issues.apache.org/jira/browse/HDFS-4949 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 3.0.0, 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: caching-design-doc-2013-07-02.pdf, caching-design-doc-2013-08-09.pdf HDFS currently has no support for managing or exposing in-memory caches at datanodes. This makes it harder for higher level application frameworks like Hive, Pig, and Impala to effectively use cluster memory, because they cannot explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID
[ https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5076: Attachment: HDFS-5076.004.patch Update the patch to address Suresh's comments: add a new method which takes journal id as parameter and returns its status. bq. In journal status should we also return the address and port of the Journal node? Currently we put the MXBean to each JournalNode. Thus when querying the jmx the user should already have the knowledge about the corresponding JournalNode. So I think maybe here we do not need to return the address and port of the JN. Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID Key: HDFS-5076 URL: https://issues.apache.org/jira/browse/HDFS-5076 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, HDFS-5076.003.patch, HDFS-5076.004.patch Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide servlets to enable querying these information through http, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status
[ https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5076: Description: Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide support to enable querying these information through JMX, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean interface for JournalNodes to query the status of journals (e.g., whether journals are formatted or not). (was: Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide servlets to enable querying these information through http, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary.) Add MXBean methods to query NN's transaction information and JournalNode's journal status - Key: HDFS-5076 URL: https://issues.apache.org/jira/browse/HDFS-5076 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, HDFS-5076.003.patch, HDFS-5076.004.patch Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide support to enable querying these information through JMX, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean interface for JournalNodes to query the status of journals (e.g., whether journals are formatted or not). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status
[ https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5076: Summary: Add MXBean methods to query NN's transaction information and JournalNode's journal status (was: Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID) Add MXBean methods to query NN's transaction information and JournalNode's journal status - Key: HDFS-5076 URL: https://issues.apache.org/jira/browse/HDFS-5076 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, HDFS-5076.003.patch, HDFS-5076.004.patch Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide servlets to enable querying these information through http, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode
[ https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740166#comment-13740166 ] Andrew Wang commented on HDFS-5051: --- I included the gen stamp and length in the {{cacheReport}} to handle caching newly appended data. I guess the gen stamp is unnecessary, but the DN isn't going to automatically mlock newly appended data, so the NN needs to somehow realize that the cached length is shorter than the new length and ask the DN to recache at the new length. Alternatively, I guess the DN could automatically mlock appended data, but there are quota implications there. On startup, I agree that we can skip cache reports until the cache is populated. I also agree that jittering doesn't matter as much if it's ticking on such a short time scale. I guess I could have cleaned this up rather than just changing the default cache report period like Colin asked. However, since we want to eventually have both incremental and full reports, let's just ape how block reports work; don't jitter the incremental reports, but do jitter the start time for the full reports and afterwards tick at a regular interval. Let's clean up all these issues in the incremental cache report JIRA (HDFS-5092); if this sounds good, I'll edit the JIRA description with these todo items. Propagate cache status information from the DataNode to the NameNode Key: HDFS-5051 URL: https://issues.apache.org/jira/browse/HDFS-5051 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch The DataNode needs to inform the NameNode of its current cache state. Let's wire up the RPCs and stub out the relevant methods on the DN and NN side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address
[ https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740172#comment-13740172 ] Allen Wittenauer commented on HDFS-5055: new patch appears to be working for me as well. nn-2nn ignores dfs.namenode.secondary.http-address --- Key: HDFS-5055 URL: https://issues.apache.org/jira/browse/HDFS-5055 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Assignee: Vinay Priority: Blocker Labels: regression Attachments: HDFS-5055.patch, HDFS-5055.patch The primary namenode attempts to connect back to (incoming hostname):port regardless of how dfs.namenode.secondary.http-address is configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode
[ https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740174#comment-13740174 ] Suresh Srinivas commented on HDFS-5051: --- bq. I included the gen stamp and length in the cacheReport to handle caching newly appended data. We need to specify what the cache behavior in this case is. My understanding was that for the first phase new data written will not be cached automatically. In fact any file that is being written to will not be cached until it is closed. Lets clearly define the behavior in these cases. Rest sounds good. Thank you [~andrew.wang] for comprehensive look at the comments. Propagate cache status information from the DataNode to the NameNode Key: HDFS-5051 URL: https://issues.apache.org/jira/browse/HDFS-5051 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch The DataNode needs to inform the NameNode of its current cache state. Let's wire up the RPCs and stub out the relevant methods on the DN and NN side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations
[ https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4985: Attachment: h4985.02.patch Updated patch per design doc on 2832. Add storage type to the protocol and expose it in block report and block locations -- Key: HDFS-4985 URL: https://issues.apache.org/jira/browse/HDFS-4985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Arpit Agarwal Attachments: h4985.02.patch, HDFS-4985.001.patch With HDFS-2880 datanode now supports storage abstraction. This is to add storage type in to the protocol. Datanodes currently report blocks per storage. Storage would include storage type attribute. Namenode also exposes the storage type of a block in block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode
[ https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740200#comment-13740200 ] Andrew Wang commented on HDFS-5051: --- Gotcha, makes sense. I definitely only wanted to address caching finalized blocks at first, but I was thinking about the case where an append+write+close would lead to a finalized block with a new longer length. Let's punt that out to an auto-caching subtask (will file). So, I'll remove the gen stamp and length in HDFS-5092; will edit it with this and the other todo items. Propagate cache status information from the DataNode to the NameNode Key: HDFS-5051 URL: https://issues.apache.org/jira/browse/HDFS-5051 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch The DataNode needs to inform the NameNode of its current cache state. Let's wire up the RPCs and stub out the relevant methods on the DN and NN side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5096) Automatically cache new data added to a cached path
Andrew Wang created HDFS-5096: - Summary: Automatically cache new data added to a cached path Key: HDFS-5096 URL: https://issues.apache.org/jira/browse/HDFS-5096 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Andrew Wang For some applications, it's convenient to specify a path to cache, and have HDFS automatically cache new data added to the path without sending a new caching request or a manual refresh command. One example is new data appended to a cached file. It would be nice to re-cache a block at the new appended length, and cache new blocks added to the file. Another example is a cached Hive partition directory, where a user can drop new files directly into the partition. It would be nice if these new files were cached. In both cases, this automatic caching would happen after the file is closed, i.e. block replica is finalized. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3656) ZKFC may write a null breadcrumb znode
[ https://issues.apache.org/jira/browse/HDFS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3656. --- Resolution: Duplicate Target Version/s: (was: ) Yep, I think you're right. Thanks. ZKFC may write a null breadcrumb znode Key: HDFS-3656 URL: https://issues.apache.org/jira/browse/HDFS-3656 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon A user [reported|https://issues.cloudera.org/browse/DISTRO-412] an NPE trying to read the breadcrumb znode in the failover controller. This happened repeatedly, implying that an earlier process set the znode to null - probably some race, though I don't see anything obvious in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5092) Add support for incremental cache reports
[ https://issues.apache.org/jira/browse/HDFS-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5092: -- Description: The initial {{cacheReport}} patch at HDFS-5051 does frequent full reports of DN cache state. Better would be a scheme similar to how block reports are currently done: send incremental cache reports on every heartbeat (seconds), and full reports on a longer time scale (minutes to hours). This should reduce network traffic and allow us to make incremental reports even faster. As per discussion on HDFS-5051, we should also roll-up the following review comments: - Remove gen stamp and length from {{cacheReport}}, unnecessary until we do auto-caching of appended data - Only jitter full cache reports, similar to how full block reports are jittered - On DN startup, skip all cache reports until the cache is populated. The NN can just assume the DN cache is empty in the meantime. was:We should send incremental cache reports as part of DN heartbeats, similar to how we do incremental block reports. Then we would only need to send full cache reports rarely (again similar to full block reports). Assignee: Andrew Wang Summary: Add support for incremental cache reports (was: piggyback incremental cache reports on DN heartbeats) Add support for incremental cache reports - Key: HDFS-5092 URL: https://issues.apache.org/jira/browse/HDFS-5092 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Priority: Minor The initial {{cacheReport}} patch at HDFS-5051 does frequent full reports of DN cache state. Better would be a scheme similar to how block reports are currently done: send incremental cache reports on every heartbeat (seconds), and full reports on a longer time scale (minutes to hours). This should reduce network traffic and allow us to make incremental reports even faster. As per discussion on HDFS-5051, we should also roll-up the following review comments: - Remove gen stamp and length from {{cacheReport}}, unnecessary until we do auto-caching of appended data - Only jitter full cache reports, similar to how full block reports are jittered - On DN startup, skip all cache reports until the cache is populated. The NN can just assume the DN cache is empty in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740240#comment-13740240 ] Konstantin Shvachko commented on HDFS-2994: --- Liked the approach of the last patch, which updates the inode reference only when needed. I'd recommend to reuse myFile variable, making it non final. myFile = INodeFile.valueOf(dir.getINode(src), src, true); This should make it easier to port to other versions. The comment is good, just don't use JavaDoc style. Regular // comment would do better. If lease is recovered successfully inline with create, create can fail -- Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: amith Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
[ https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740247#comment-13740247 ] Suresh Srinivas commented on HDFS-4898: --- +1 for the patch. We should add a unit test for this. BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack - Key: HDFS-4898 URL: https://issues.apache.org/jira/browse/HDFS-4898 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.0.4-alpha Reporter: Eric Sirianni Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h4898_20130809.patch As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not properly fallback to local rack when no nodes are available in remote racks, resulting in an improper {{NotEnoughReplicasException}}. {code:title=BlockPlacementPolicyWithNodeGroup.java} @Override protected void chooseRemoteRack(int numOfReplicas, DatanodeDescriptor localMachine, HashMapNode, Node excludedNodes, long blocksize, int maxReplicasPerRack, ListDatanodeDescriptor results, boolean avoidStaleNodes) throws NotEnoughReplicasException { int oldNumOfReplicas = results.size(); // randomly choose one node from remote racks try { chooseRandom( numOfReplicas, ~ + NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), excludedNodes, blocksize, maxReplicasPerRack, results, avoidStaleNodes); } catch (NotEnoughReplicasException e) { chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas), localMachine.getNetworkLocation(), excludedNodes, blocksize, maxReplicasPerRack, results, avoidStaleNodes); } } {code} As currently coded the {{chooseRandom()}} call in the {{catch}} block will never succeed as the set of nodes within the passed in node path (e.g. {{/rack1/nodegroup1}}) is entirely contained within the set of excluded nodes (both are the set of nodes within the same nodegroup as the node chosen first replica). The bug is that the fallback {{chooseRandom()}} call in the catch block should be passing in the _complement_ of the node path used in the initial {{chooseRandom()}} call in the try block (e.g. {{/rack1}}) - namely: {code} NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()) {code} This will yield the proper fallback behavior of choosing a random node from _within the same rack_, but still excluding those nodes _in the same nodegroup_ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations
[ https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4985: Attachment: (was: HDFS-4985.001.patch) Add storage type to the protocol and expose it in block report and block locations -- Key: HDFS-4985 URL: https://issues.apache.org/jira/browse/HDFS-4985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Arpit Agarwal Attachments: h4985.02.patch With HDFS-2880 datanode now supports storage abstraction. This is to add storage type in to the protocol. Datanodes currently report blocks per storage. Storage would include storage type attribute. Namenode also exposes the storage type of a block in block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740261#comment-13740261 ] Konstantin Shvachko commented on HDFS-2994: --- I checked your new test case. It works with patched code and fails with current implementation. But I see tabs, could you please revert to spaces. If lease is recovered successfully inline with create, create can fail -- Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: amith Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address
[ https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740275#comment-13740275 ] Jing Zhao commented on HDFS-5055: - {code} String machine = imageListenAddress.getAddress().isAnyLocalAddress() ? null : imageListenAddress.getHostName(); {code} Looks like here if the http address in the configuration is wrong, the UnknownHostException will cause imageListenAddress.getAddress() to return null. We thus may need to add an extra check here. Other than that +1 for the patch. nn-2nn ignores dfs.namenode.secondary.http-address --- Key: HDFS-5055 URL: https://issues.apache.org/jira/browse/HDFS-5055 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Assignee: Vinay Priority: Blocker Labels: regression Attachments: HDFS-5055.patch, HDFS-5055.patch The primary namenode attempts to connect back to (incoming hostname):port regardless of how dfs.namenode.secondary.http-address is configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-5050) Add DataNode support for mlock and munlock
[ https://issues.apache.org/jira/browse/HDFS-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang reassigned HDFS-5050: - Assignee: Andrew Wang Add DataNode support for mlock and munlock -- Key: HDFS-5050 URL: https://issues.apache.org/jira/browse/HDFS-5050 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Add DataNode support for mlock and munlock. The DataNodes should respond to RPCs telling them to mlock and munlock blocks. Blocks should be uncached when the NameNode asks for them to be moved or deleted. For now, we should cache only completed blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status
[ https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740297#comment-13740297 ] Hadoop QA commented on HDFS-5076: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598048/HDFS-5076.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4822//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4822//console This message is automatically generated. Add MXBean methods to query NN's transaction information and JournalNode's journal status - Key: HDFS-5076 URL: https://issues.apache.org/jira/browse/HDFS-5076 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, HDFS-5076.003.patch, HDFS-5076.004.patch Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide support to enable querying these information through JMX, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean interface for JournalNodes to query the status of journals (e.g., whether journals are formatted or not). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo updated HDFS-2994: -- Attachment: HDFS-2994_4.patch Thanks Konstantin. The newest patch addresses the above two comments. If lease is recovered successfully inline with create, create can fail -- Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: amith Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address
[ https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5055: -- Attachment: HDFS-5055.1.patch Thanks Jing. Here is the updated patch. nn-2nn ignores dfs.namenode.secondary.http-address --- Key: HDFS-5055 URL: https://issues.apache.org/jira/browse/HDFS-5055 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Assignee: Vinay Priority: Blocker Labels: regression Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch The primary namenode attempts to connect back to (incoming hostname):port regardless of how dfs.namenode.secondary.http-address is configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services
[ https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740337#comment-13740337 ] Allen Wittenauer commented on HDFS-5087: Rather than make something custom for heap, doesn't it make more sense to just process the command line parameters and strip duplicates? For example, right now I'm fighting with the NN logging because I want to set hadoop.root.logger differently per service. Is the expectation that we'll create one of these processing loops for all the values? That won't scale. Allowing specific JAVA heap max setting for HDFS related services - Key: HDFS-5087 URL: https://issues.apache.org/jira/browse/HDFS-5087 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Kai Zheng Priority: Minor Attachments: HDFS-5087.patch This allows specific JAVA heap max setting for HDFS related services as it does for YARN services, to be consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations
[ https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740377#comment-13740377 ] Hadoop QA commented on HDFS-4985: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598059/h4985.02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4823//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4823//console This message is automatically generated. Add storage type to the protocol and expose it in block report and block locations -- Key: HDFS-4985 URL: https://issues.apache.org/jira/browse/HDFS-4985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Arpit Agarwal Attachments: h4985.02.patch With HDFS-2880 datanode now supports storage abstraction. This is to add storage type in to the protocol. Datanodes currently report blocks per storage. Storage would include storage type attribute. Namenode also exposes the storage type of a block in block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4994) Audit log getContentSummary() calls
[ https://issues.apache.org/jira/browse/HDFS-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740385#comment-13740385 ] Kihwal Lee commented on HDFS-4994: -- I know getListingInt() does it too, but it will be better if we do logAudit() outside of the FSNamespace lock. We could catch AccessControlException, record the failure, then rethrow. In the finally block, we can then call logAudit() with false if a failure was recorded, otherwise call it with true. Audit log getContentSummary() calls --- Key: HDFS-4994 URL: https://issues.apache.org/jira/browse/HDFS-4994 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.9, 2.3.0 Reporter: Kihwal Lee Assignee: Robert Parker Priority: Minor Labels: newbie Attachments: HDFS-4994_branch-0.23.patch, HDFS-4994.patch Currently there getContentSummary() calls are not logged anywhere. It should be logged in the audit log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations
[ https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740388#comment-13740388 ] Arpit Agarwal commented on HDFS-4985: - Reattaching correct patch file. Add storage type to the protocol and expose it in block report and block locations -- Key: HDFS-4985 URL: https://issues.apache.org/jira/browse/HDFS-4985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Arpit Agarwal Attachments: h4985.02.patch With HDFS-2880 datanode now supports storage abstraction. This is to add storage type in to the protocol. Datanodes currently report blocks per storage. Storage would include storage type attribute. Namenode also exposes the storage type of a block in block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations
[ https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4985: Attachment: h4985.02.patch Add storage type to the protocol and expose it in block report and block locations -- Key: HDFS-4985 URL: https://issues.apache.org/jira/browse/HDFS-4985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Arpit Agarwal Attachments: h4985.02.patch With HDFS-2880 datanode now supports storage abstraction. This is to add storage type in to the protocol. Datanodes currently report blocks per storage. Storage would include storage type attribute. Namenode also exposes the storage type of a block in block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations
[ https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4985: Attachment: (was: h4985.02.patch) Add storage type to the protocol and expose it in block report and block locations -- Key: HDFS-4985 URL: https://issues.apache.org/jira/browse/HDFS-4985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Arpit Agarwal Attachments: h4985.02.patch With HDFS-2880 datanode now supports storage abstraction. This is to add storage type in to the protocol. Datanodes currently report blocks per storage. Storage would include storage type attribute. Namenode also exposes the storage type of a block in block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services
[ https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740414#comment-13740414 ] Kai Zheng commented on HDFS-5087: - bq.just process the command line parameters and strip duplicates? Rather than do some post fix like this, wouldn't we have a consistent approach? Why introduce JAVA_HEAP_MAX? Either respect it always or discard it I would think. bq.That won't scale. I'm wondering if it's a good practice to add many application options and parameters like logging stuff via -D to JAVA command line. Allowing specific JAVA heap max setting for HDFS related services - Key: HDFS-5087 URL: https://issues.apache.org/jira/browse/HDFS-5087 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Kai Zheng Priority: Minor Attachments: HDFS-5087.patch This allows specific JAVA heap max setting for HDFS related services as it does for YARN services, to be consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4816) transitionToActive blocks if the SBN is doing checkpoint image transfer
[ https://issues.apache.org/jira/browse/HDFS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-4816: -- Resolution: Fixed Fix Version/s: 2.3.0 Status: Resolved (was: Patch Available) Great, thanks atm. Committed to trunk and branch-2. transitionToActive blocks if the SBN is doing checkpoint image transfer --- Key: HDFS-4816 URL: https://issues.apache.org/jira/browse/HDFS-4816 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.0.4-alpha Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.3.0 Attachments: hdfs-4816-1.patch, hdfs-4816-2.patch, hdfs-4816-3.patch, hdfs-4816-4.patch, hdfs-4816-slow-shutdown.txt, stacks.out The NN and SBN do this dance during checkpoint image transfer with nested HTTP GETs via {{HttpURLConnection}}. When an admin does a {{-transitionToActive}} during this transfer, part of that is interrupting an ongoing checkpoint so we can transition immediately. However, the {{thread.interrupt()}} in {{StandbyCheckpointer#stop}} gets swallowed by {{connection.getResponseCode()}} in {{TransferFsImage#doGetUrl}}. None of the methods in HttpURLConnection throw InterruptedException, so we need to do something else (perhaps HttpClient [1]): [1]: http://hc.apache.org/httpclient-3.x/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4816) transitionToActive blocks if the SBN is doing checkpoint image transfer
[ https://issues.apache.org/jira/browse/HDFS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740428#comment-13740428 ] Hudson commented on HDFS-4816: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4261 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4261/]) HDFS-4816. transitionToActive blocks if the SBN is doing checkpoint image transfer. (Andrew Wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514095) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java transitionToActive blocks if the SBN is doing checkpoint image transfer --- Key: HDFS-4816 URL: https://issues.apache.org/jira/browse/HDFS-4816 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.0.4-alpha Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.3.0 Attachments: hdfs-4816-1.patch, hdfs-4816-2.patch, hdfs-4816-3.patch, hdfs-4816-4.patch, hdfs-4816-slow-shutdown.txt, stacks.out The NN and SBN do this dance during checkpoint image transfer with nested HTTP GETs via {{HttpURLConnection}}. When an admin does a {{-transitionToActive}} during this transfer, part of that is interrupting an ongoing checkpoint so we can transition immediately. However, the {{thread.interrupt()}} in {{StandbyCheckpointer#stop}} gets swallowed by {{connection.getResponseCode()}} in {{TransferFsImage#doGetUrl}}. None of the methods in HttpURLConnection throw InterruptedException, so we need to do something else (perhaps HttpClient [1]): [1]: http://hc.apache.org/httpclient-3.x/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4504: --- Attachment: HDFS-4504.014.patch DFSOutputStream#close doesn't always release resources (such as leases) --- Key: HDFS-4504 URL: https://issues.apache.org/jira/browse/HDFS-4504 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One example is if there is a pipeline error and then pipeline recovery fails. Unfortunately, in this case, some of the resources used by the {{DFSOutputStream}} are leaked. One particularly important resource is file leases. So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to a file, but then fail to close it. Unfortunately, the {{LeaseRenewerThread}} inside the client will continue to renew the lease for the undead file. Future attempts to close the file will just rethrow the previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740434#comment-13740434 ] Colin Patrick McCabe commented on HDFS-4504: The latest patch: * when reaping zombie files, don't use recoverLease. Instead, add a force flag to completeFile. * add {{dfs.client.close.timeout.ms}}, to specify how long we should wait inside close() before making the file a zombie. Previously, we used {{ipc.ping.interval}} to determine how long to wait. Having a configuration option for this makes a lot of unit tests that want to test close + unresponsive namenode much simpler to do. * {{FSNamesystem#completeFile}} should issue a different log message on failure than on success. * {{TestHFlush#testHFlushInterrupted}}: Thread#interrupted is a static function; refer to it statically to avoid Java warning. Clear interrupted status when appropriate. Since this is a bigger change, I added small whitespace changes in hadoop-mapreduce-client, hadoop-yarn, and hadoop-tools to get a full test run, so that we can become aware of any issues. DFSOutputStream#close doesn't always release resources (such as leases) --- Key: HDFS-4504 URL: https://issues.apache.org/jira/browse/HDFS-4504 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One example is if there is a pipeline error and then pipeline recovery fails. Unfortunately, in this case, some of the resources used by the {{DFSOutputStream}} are leaked. One particularly important resource is file leases. So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to a file, but then fail to close it. Unfortunately, the {{LeaseRenewerThread}} inside the client will continue to renew the lease for the undead file. Future attempts to close the file will just rethrow the previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740453#comment-13740453 ] Konstantin Shvachko commented on HDFS-5079: --- +1 Looks good. No test needed for code removal. Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address
[ https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740452#comment-13740452 ] Jing Zhao commented on HDFS-5055: - The new patch looks pretty good to me. +1. nn-2nn ignores dfs.namenode.secondary.http-address --- Key: HDFS-5055 URL: https://issues.apache.org/jira/browse/HDFS-5055 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Assignee: Vinay Priority: Blocker Labels: regression Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch The primary namenode attempts to connect back to (incoming hostname):port regardless of how dfs.namenode.secondary.http-address is configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status
[ https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5076: Attachment: HDFS-5076.005.patch After some offline discussion with Suresh, we think a MXBean method getJournalStatus(String jid) may not be a good idea: a bad jid will also cause the creation a corresponding Journal object in JN, and this may allow malicious users to attack JN. Because Journal objects are created lazily, it is possible that a journal has been formatted but is not included in the journalsById list because of JN's restarting. In the current patch we just simply assume that if a directory has been created in the journal dir, the corresponding journal should have been formatted. We can also call analyzeStorage method to make sure if necessary. Add MXBean methods to query NN's transaction information and JournalNode's journal status - Key: HDFS-5076 URL: https://issues.apache.org/jira/browse/HDFS-5076 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, HDFS-5076.003.patch, HDFS-5076.004.patch, HDFS-5076.005.patch Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide support to enable querying these information through JMX, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean interface for JournalNodes to query the status of journals (e.g., whether journals are formatted or not). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740494#comment-13740494 ] Hadoop QA commented on HDFS-2994: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598074/HDFS-2994_4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4825//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/4825//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4825//console This message is automatically generated. If lease is recovered successfully inline with create, create can fail -- Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: amith Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5055) nn-2nn ignores dfs.namenode.secondary.http-address
[ https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740493#comment-13740493 ] Hadoop QA commented on HDFS-5055: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598075/HDFS-5055.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4824//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4824//console This message is automatically generated. nn-2nn ignores dfs.namenode.secondary.http-address --- Key: HDFS-5055 URL: https://issues.apache.org/jira/browse/HDFS-5055 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Assignee: Vinay Priority: Blocker Labels: regression Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch The primary namenode attempts to connect back to (incoming hostname):port regardless of how dfs.namenode.secondary.http-address is configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5055) nn fails to download checkpointed image from snn in some setups
[ https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5055: -- Summary: nn fails to download checkpointed image from snn in some setups (was: nn-2nn ignores dfs.namenode.secondary.http-address) nn fails to download checkpointed image from snn in some setups --- Key: HDFS-5055 URL: https://issues.apache.org/jira/browse/HDFS-5055 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Assignee: Vinay Priority: Blocker Labels: regression Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch The primary namenode attempts to connect back to (incoming hostname):port regardless of how dfs.namenode.secondary.http-address is configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5055) nn fails to download checkpointed image from snn in some setups
[ https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5055: -- Resolution: Fixed Fix Version/s: 2.1.1-beta Status: Resolved (was: Patch Available) Committed the patch to trunk, branch-2 and branch-2.1. Thanks Jing for the review and Allen for verifying it works. Thank you Vinay for the patch! nn fails to download checkpointed image from snn in some setups --- Key: HDFS-5055 URL: https://issues.apache.org/jira/browse/HDFS-5055 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Assignee: Vinay Priority: Blocker Labels: regression Fix For: 2.1.1-beta Attachments: HDFS-5055.1.patch, HDFS-5055.patch, HDFS-5055.patch The primary namenode attempts to connect back to (incoming hostname):port regardless of how dfs.namenode.secondary.http-address is configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5097) TestDoAsEffectiveUser can fail on JDK 7
Aaron T. Myers created HDFS-5097: Summary: TestDoAsEffectiveUser can fail on JDK 7 Key: HDFS-5097 URL: https://issues.apache.org/jira/browse/HDFS-5097 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Another issue with the test method execution order changing between JDK 6 and 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5077) NPE in FSNamesystem.commitBlockSynchronization()
[ https://issues.apache.org/jira/browse/HDFS-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740512#comment-13740512 ] Konstantin Shvachko commented on HDFS-5077: --- Yes this is pretty rare, but if I hit it (yes while testing HA), so others could too. The reason is similar to yours. But whatever the reason we should fix NPE. NPE in FSNamesystem.commitBlockSynchronization() Key: HDFS-5077 URL: https://issues.apache.org/jira/browse/HDFS-5077 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.5-alpha Reporter: Konstantin Shvachko NN starts a block recovery, which will synchronize block replicas on different DNs. In the end one of DNs will report the list of the nodes containing the consistent replicas to the NN via commitBlockSynchronization() call. The NPE happens if just before processing commitBlockSynchronization() NN removes from active one of DNs that are then reported in the call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5098) Enhance FileSystem.Statistics to have locality information
Bikas Saha created HDFS-5098: Summary: Enhance FileSystem.Statistics to have locality information Key: HDFS-5098 URL: https://issues.apache.org/jira/browse/HDFS-5098 Project: Hadoop HDFS Issue Type: Improvement Reporter: Bikas Saha Fix For: 2.1.1-beta Currently in MR/Tez we dont have a good and accurate means to detect how much the the IO was actually done locally. Getting this information from the source of truth would be much better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5004) Add additional JMX bean for NameNode status data
[ https://issues.apache.org/jira/browse/HDFS-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740532#comment-13740532 ] Konstantin Shvachko commented on HDFS-5004: --- Cos, you need to move the record about this jira in CHANGES.txt on trunk under 2.3.0 section. It is inconsistent now. Add additional JMX bean for NameNode status data Key: HDFS-5004 URL: https://issues.apache.org/jira/browse/HDFS-5004 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Fix For: 3.0.0, 2.3.0 Attachments: HDFS-5004.diff, HDFS-5004.diff, HDFS-5004.diff Currently the JMX beans returns much of the data contained on the HDFS Health webpage (dfsHealth.html). However there are several other attributes that are required to be added, that can only be accessed from within NameNode. For this reason a new JMX bean is required (NameNodeStatusMXBean) which will expose the following attributes in NameNode: Role State HostAndPort also a list of the corruptedFiles should be exposed by NameNodeMXBean. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active
[ https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740531#comment-13740531 ] Suresh Srinivas commented on HDFS-5080: --- +1 for the patch. BootstrapStandby not working with QJM when the existing NN is active Key: HDFS-5080 URL: https://issues.apache.org/jira/browse/HDFS-5080 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5080.000.patch, HDFS-5080.001.patch, HDFS-5080.002.patch Currently when QJM is used, running BootstrapStandby while the existing NN is active can get the following exception: {code} FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 from the configured shared edits storage. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 6175405 but unable to find any edit logs containing txid 6175405 java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 6175405 but unable to find any edit logs containing txid 6175405 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258) at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229) {code} Looks like the cause of the exception is that, when the active NN is queries by BootstrapStandby about the last written transaction ID, the in-progress edit log segment is included. However, when journal nodes are asked about the last written transaction ID, in-progress edit log is excluded. This causes BootstrapStandby#checkLogsAvailableForRead to complain gaps. To fix this, we can either let journal nodes take into account the in-progress editlog, or let active NN exclude the in-progress edit log segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5068) Convert NNThroughputBenchmark to a Tool to allow generic options.
[ https://issues.apache.org/jira/browse/HDFS-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-5068: -- Resolution: Fixed Fix Version/s: 2.3.0 Status: Resolved (was: Patch Available) Committed this to trunk and branch-2.3. Will move it further down if requested. Convert NNThroughputBenchmark to a Tool to allow generic options. - Key: HDFS-5068 URL: https://issues.apache.org/jira/browse/HDFS-5068 Project: Hadoop HDFS Issue Type: Improvement Components: benchmarks Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.3.0 Attachments: NNThBenchTool.patch, NNThBenchTool.patch Currently NNThroughputBenchmark does not recognize generic options like -conf, etc. A simple way to enable such functionality is to make it implement Tool interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active
[ https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740542#comment-13740542 ] Jing Zhao commented on HDFS-5080: - Thanks for the review, Suresh! I plan to commit this early next morning if there is no more comment. We can open new jiras for comments after committing. BootstrapStandby not working with QJM when the existing NN is active Key: HDFS-5080 URL: https://issues.apache.org/jira/browse/HDFS-5080 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5080.000.patch, HDFS-5080.001.patch, HDFS-5080.002.patch Currently when QJM is used, running BootstrapStandby while the existing NN is active can get the following exception: {code} FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 from the configured shared edits storage. Please copy these logs into the shared edits storage or call saveNamespace on the active node. Error: Gap in transactions. Expected to be able to read up until at least txid 6175405 but unable to find any edit logs containing txid 6175405 java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 6175405 but unable to find any edit logs containing txid 6175405 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258) at org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229) {code} Looks like the cause of the exception is that, when the active NN is queries by BootstrapStandby about the last written transaction ID, the in-progress edit log segment is included. However, when journal nodes are asked about the last written transaction ID, in-progress edit log is excluded. This causes BootstrapStandby#checkLogsAvailableForRead to complain gaps. To fix this, we can either let journal nodes take into account the in-progress editlog, or let active NN exclude the in-progress edit log segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-5079: -- Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Status: Resolved (was: Patch Available) I just committed this to trunk. Thank you Tao. Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 3.0.0 Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740546#comment-13740546 ] Hadoop QA commented on HDFS-4504: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598096/HDFS-4504.014.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-tools/hadoop-distcp hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.TestHdfsClose org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-tools/hadoop-distcp hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.hdfs.TestFileAppend3 {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4827//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4827//console This message is automatically generated. DFSOutputStream#close doesn't always release resources (such as leases) --- Key: HDFS-4504 URL: https://issues.apache.org/jira/browse/HDFS-4504 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One example is if there is a pipeline error and then pipeline recovery fails. Unfortunately, in this case, some of the resources used by the {{DFSOutputStream}} are leaked. One particularly important resource is file leases. So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to a file, but then fail to close it. Unfortunately, the {{LeaseRenewerThread}} inside the client will continue to renew the lease for the undead file. Future attempts to close the file will just rethrow the previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap
[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740551#comment-13740551 ] Colin Patrick McCabe commented on HDFS-4953: bq. brandon wrote: 1. DFSClient: looks like all the DFSClient instances share the same ClientMmapManager instance. If this is the case, why not have one static ClientMmapManager with a refcount to it, and remove ClientMmapManagerFactory class and variable mmapManager? I think it's better to have the refcount and manager instance encapsulated as private data inside an object, rather than floating around in the DFSClient class, because it prevents errors where someone might access the field without updating the reference count properly. bq. 2. HdfsZeroCopyCursor: might want to also initialize allowShortReads in the constructor. Users can't create instances of HdfsZeroCopyCursor directly (it's package-private). {{DFSInputStream#createZeroCopyConstructor}} creates them. We could start adding booleans to this function, but it seems clearer for people to just use setAllowShortReads. The kind of mess we have with FileSystem#create where there are a dozen different overloads and nobody can keep them straight is an antipattern. bq. ... Not sure which case is more expected by the users, shortReads allowed or disallowed. That's a good question. My experience has been that many developers don't handle short reads very well (sometimes including me). It's just another corner case that they have to remember to handle, and if they're not FS developers they often don't even realize that it can happen. So I have defaulted it to off unless it's explicitly requested. bq. 4. DFSInputStream: remove unused import and add debug level check for DFSClient.LOG.Debug(). OK bq. 5. TestBlockReader: Assume.assumeTrue(SystemUtils.IS_OS_UNIX), guess you meant IS_OS_LINUX mmap is present and supported on other UNIXes besides Linux bq. 6. test_libhdfs_zerocopy.c: remove repeated fixed bq. 7. TestBlockReaderLocal.java: remove unused import ok bq. 8. please add javadoc to some classes, e.g., ClientMap,ClientMapManager ok bq. andrew wrote: [hdfs-default.xml] Has some extra lines of java pasted in. fixed bq. Let's beef up the [zerocopycursor] javadoc I added an example. bq. read() javadoc: EOF here refers to an EOF when reading a block, not EOF of the HDFS file. Would prefer to see end of block. EOF is only thrown at end-of-file, as described in the JavaDoc. bq. Would like to see explicit setting of allowShortReads to false in the constructor for clarity. done bq. serialVersionUID should be private ok bq. Maybe rename put to unref or close? It's not actually putting in the data structure sense, which is confusing. renamed to unref bq. let's not call people bad programmers, just say accidentally leaked references. I changed this to code which leaks references accidentally to make it more boring bq. unmap: add to javadoc that it should only be called if the manager has been closed, or by the manager with the lock held. I added Should be called with the ClientMmapManager lock held bq. Need a space before the =. ok bq. Let's add some javadoc on... why it's important to cache [mmaps] added to ClientMmapManager bq. I think fromConf-style factory methods are more normally called get, e.g. FileSystem.get. FileSystem#get uses a cache, whereas ClientMmapManager#fromConf does not. I think it would be confusing to name them similarly... bq. Why is the CacheCleaner executor using half the timeout for the delay and period? Half the timeout period is the minimum period for which we can ensure that we time out mmaps on time. Think about if we used the timeout itself as the period. In that case, we might be 1 second away from the 15-minute (or whatever) expiration period when the cleaner thread runs. Then we have to wait another 15 minutes, effectively doubling the timeout. bq. We might in fact want to key off of System.nanoTime for fewer collisions Good point; changed. bq. I think evictOne would be clearer if you used TreeSet#pollFirst rather than an iterator. yeah, changed bq. This has 10 spaces, where elsewhere in the file you use a double-indent of 4. ok, I'll make it 4 bq. Remaining TODO for blocks bigger than 2GB, want to file a follow-on JIRA for this? filed bq. readZeroCopy catches and re-sets the interrupted status, does something else check this later? No. It would only happen if some third-party software delivered an InterruptedException to us. In that case the client is responsible for checking and doing something with the InterruptedException (or not). This all happens in the client thread. bq. Is it worth re-trying the mmap after a CacheCleaner period in case some space has been freed up in the cache? BlockReader objects get destroyed and re-created a lot. For example, a long seek
[jira] [Commented] (HDFS-5068) Convert NNThroughputBenchmark to a Tool to allow generic options.
[ https://issues.apache.org/jira/browse/HDFS-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740554#comment-13740554 ] Hudson commented on HDFS-5068: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4262/]) HDFS-5068. Convert NNThroughputBenchmark to a Tool to allow generic options. Contributed by Konstantin Shvachko. (shv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514114) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java Convert NNThroughputBenchmark to a Tool to allow generic options. - Key: HDFS-5068 URL: https://issues.apache.org/jira/browse/HDFS-5068 Project: Hadoop HDFS Issue Type: Improvement Components: benchmarks Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.3.0 Attachments: NNThBenchTool.patch, NNThBenchTool.patch Currently NNThroughputBenchmark does not recognize generic options like -conf, etc. A simple way to enable such functionality is to make it implement Tool interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode
[ https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740555#comment-13740555 ] Hudson commented on HDFS-5051: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4262/]) HDFS-5051. nn fails to download checkpointed image from snn in some setups. Contributed by Vinay and Suresh Srinivas. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514110) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java Propagate cache status information from the DataNode to the NameNode Key: HDFS-5051 URL: https://issues.apache.org/jira/browse/HDFS-5051 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch The DataNode needs to inform the NameNode of its current cache state. Let's wire up the RPCs and stub out the relevant methods on the DN and NN side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5079) Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.
[ https://issues.apache.org/jira/browse/HDFS-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740556#comment-13740556 ] Hudson commented on HDFS-5079: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4262/]) HDFS-5079. Cleaning up NNHAStatusHeartbeat.State from DatanodeProtocolProtos. Contributed by Tao Luo. (shv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1514118) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos. - Key: HDFS-5079 URL: https://issues.apache.org/jira/browse/HDFS-5079 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Fix For: 3.0.0 Attachments: HDFS-5079.patch NNHAStatusHeartbeat.State was removed from usage by HDFS-4268. The respective class should also be removed from DatanodeProtocolProtos. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4985) Add storage type to the protocol and expose it in block report and block locations
[ https://issues.apache.org/jira/browse/HDFS-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740564#comment-13740564 ] Hadoop QA commented on HDFS-4985: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598084/h4985.02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4826//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4826//console This message is automatically generated. Add storage type to the protocol and expose it in block report and block locations -- Key: HDFS-4985 URL: https://issues.apache.org/jira/browse/HDFS-4985 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Arpit Agarwal Attachments: h4985.02.patch With HDFS-2880 datanode now supports storage abstraction. This is to add storage type in to the protocol. Datanodes currently report blocks per storage. Storage would include storage type attribute. Namenode also exposes the storage type of a block in block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing
Chuan Liu created HDFS-5099: --- Summary: Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing Key: HDFS-5099 URL: https://issues.apache.org/jira/browse/HDFS-5099 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} method, we will open the underlying log file on disk. After applying all the opts, we do not close the collection of streams currently. This lead to a file handle leak on Windows as later we would fail to delete those files. This happens in TestInitializeSharedEdits test case, where we explicitly called {{Namenode# initializeSharedEdits()}}, where {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new MiniDFSCluster with the following exception. {noformat} java.io.IOException: Could not fully delete C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1 at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316) at org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) … {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing
[ https://issues.apache.org/jira/browse/HDFS-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated HDFS-5099: Status: Patch Available (was: Open) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing --- Key: HDFS-5099 URL: https://issues.apache.org/jira/browse/HDFS-5099 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Attachments: HDFS-5099-trunk.patch In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} method, we will open the underlying log file on disk. After applying all the opts, we do not close the collection of streams currently. This lead to a file handle leak on Windows as later we would fail to delete those files. This happens in TestInitializeSharedEdits test case, where we explicitly called {{Namenode# initializeSharedEdits()}}, where {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new MiniDFSCluster with the following exception. {noformat} java.io.IOException: Could not fully delete C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1 at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316) at org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) … {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing
[ https://issues.apache.org/jira/browse/HDFS-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated HDFS-5099: Attachment: HDFS-5099-trunk.patch Attaching the patch that closes all the streams in the finally clause. All other code changes are just indentation for the new try clause. Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing --- Key: HDFS-5099 URL: https://issues.apache.org/jira/browse/HDFS-5099 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Attachments: HDFS-5099-trunk.patch In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} method, we will open the underlying log file on disk. After applying all the opts, we do not close the collection of streams currently. This lead to a file handle leak on Windows as later we would fail to delete those files. This happens in TestInitializeSharedEdits test case, where we explicitly called {{Namenode# initializeSharedEdits()}}, where {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new MiniDFSCluster with the following exception. {noformat} java.io.IOException: Could not fully delete C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1 at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316) at org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) … {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo updated HDFS-2994: -- Attachment: HDFS-2994_4.patch If lease is recovered successfully inline with create, create can fail -- Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: amith Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail
[ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo updated HDFS-2994: -- Attachment: (was: HDFS-2994_4.patch) If lease is recovered successfully inline with create, create can fail -- Key: HDFS-2994 URL: https://issues.apache.org/jira/browse/HDFS-2994 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: amith Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch I saw the following logs on my test cluster: {code} 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 {code} It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5076) Add MXBean methods to query NN's transaction information and JournalNode's journal status
[ https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740602#comment-13740602 ] Hadoop QA commented on HDFS-5076: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598112/HDFS-5076.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4828//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4828//console This message is automatically generated. Add MXBean methods to query NN's transaction information and JournalNode's journal status - Key: HDFS-5076 URL: https://issues.apache.org/jira/browse/HDFS-5076 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, HDFS-5076.003.patch, HDFS-5076.004.patch, HDFS-5076.005.patch Currently NameNode already provides RPC calls to get its last applied transaction ID and most recent checkpoint's transaction ID. It can be helpful to provide support to enable querying these information through JMX, so that administrators and applications like Ambari can easily decide if a forced checkpoint by calling saveNamespace is necessary. Similarly we can add MxBean interface for JournalNodes to query the status of journals (e.g., whether journals are formatted or not). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4953) enable HDFS local reads via mmap
[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4953: --- Attachment: HDFS-4953.007.patch new patch version. I added the tests for the cache, and a java test for no backing buffer. enable HDFS local reads via mmap Key: HDFS-4953 URL: https://issues.apache.org/jira/browse/HDFS-4953 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.3.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, HDFS-4953.006.patch, HDFS-4953.007.patch Currently, the short-circuit local read pathway allows HDFS clients to access files directly without going through the DataNode. However, all of these reads involve a copy at the operating system level, since they rely on the read() / pread() / etc family of kernel interfaces. We would like to enable HDFS to read local files via mmap. This would enable truly zero-copy reads. In the initial implementation, zero-copy reads will only be performed when checksums were disabled. Later, we can use the DataNode's cache awareness to only perform zero-copy reads when we know that checksum has already been verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing
[ https://issues.apache.org/jira/browse/HDFS-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740644#comment-13740644 ] Hadoop QA commented on HDFS-5099: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598132/HDFS-5099-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4829//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4829//console This message is automatically generated. Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing --- Key: HDFS-5099 URL: https://issues.apache.org/jira/browse/HDFS-5099 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Attachments: HDFS-5099-trunk.patch In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} method, we will open the underlying log file on disk. After applying all the opts, we do not close the collection of streams currently. This lead to a file handle leak on Windows as later we would fail to delete those files. This happens in TestInitializeSharedEdits test case, where we explicitly called {{Namenode# initializeSharedEdits()}}, where {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new MiniDFSCluster with the following exception. {noformat} java.io.IOException: Could not fully delete C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1 at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316) at org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) … {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5099) Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing
[ https://issues.apache.org/jira/browse/HDFS-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740649#comment-13740649 ] Chuan Liu commented on HDFS-5099: - bq. -1 tests included. The patch doesn't appear to include any new or modified tests. The existing unit test TestInitializeSharedEdits should be able to verify this behavior. Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing --- Key: HDFS-5099 URL: https://issues.apache.org/jira/browse/HDFS-5099 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Chuan Liu Assignee: Chuan Liu Attachments: HDFS-5099-trunk.patch In {{Namenode#copyEditLogSegmentsToSharedDir()}} method, we open a collection of EditLogInputStreams to read and apply to shareEditlog. In {{readOpt()}} method, we will open the underlying log file on disk. After applying all the opts, we do not close the collection of streams currently. This lead to a file handle leak on Windows as later we would fail to delete those files. This happens in TestInitializeSharedEdits test case, where we explicitly called {{Namenode# initializeSharedEdits()}}, where {{copyEditLogSegmentsToSharedDir()}} is used. Later we fail to create new MiniDFSCluster with the following exception. {noformat} java.io.IOException: Could not fully delete C:\hdc\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1 at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:334) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316) at org.apache.hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits.setupCluster(TestInitializeSharedEdits.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) … {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira