[jira] [Commented] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than block size
[ https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245030#comment-13245030 ] Hadoop QA commented on HDFS-3181: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521105/repro.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestLeaseRecovery2 +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2167//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2167//console This message is automatically generated. testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than block size - Key: HDFS-3181 URL: https://issues.apache.org/jira/browse/HDFS-3181 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Colin Patrick McCabe Priority: Critical Attachments: TestLeaseRecovery2with1535.patch, repro.txt, testOut.txt org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart seems to be failing intermittently on jenkins. {code} org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart Failing for the past 1 build (Since Failed#2163 ) Took 8.4 sec. Error Message Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) Stacktrace org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at
[jira] [Updated] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.
[ https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liaowenrui updated HDFS-3175: - Attachment: HDFS-3175.patch HDFS-3175.patch When the disk space is available back,Namenode resource monitor can automatically take off safemode. Key: HDFS-3175 URL: https://issues.apache.org/jira/browse/HDFS-3175 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0, 2.0.0 Reporter: liaowenrui Attachments: HDFS-3175.patch, HDFS-3175.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.
[ https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245066#comment-13245066 ] liaowenrui commented on HDFS-3175: -- thank you very much When the disk space is available back,Namenode resource monitor can automatically take off safemode. Key: HDFS-3175 URL: https://issues.apache.org/jira/browse/HDFS-3175 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0, 2.0.0 Reporter: liaowenrui Attachments: HDFS-3175.patch, HDFS-3175.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.
[ https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liaowenrui updated HDFS-3175: - Attachment: testcase When the disk space is available back,Namenode resource monitor can automatically take off safemode. Key: HDFS-3175 URL: https://issues.apache.org/jira/browse/HDFS-3175 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0, 2.0.0 Reporter: liaowenrui Attachments: HDFS-3175.patch, HDFS-3175.patch, testcase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3077: -- Attachment: qjournal-design.pdf Attached a design doc draft. Look forward to your comments. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than block size
[ https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245071#comment-13245071 ] Todd Lipcon commented on HDFS-3181: --- Jenkins seems to have reproduced above testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than block size - Key: HDFS-3181 URL: https://issues.apache.org/jira/browse/HDFS-3181 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Colin Patrick McCabe Priority: Critical Attachments: TestLeaseRecovery2with1535.patch, repro.txt, testOut.txt org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart seems to be failing intermittently on jenkins. {code} org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart Failing for the past 1 build (Since Failed#2163 ) Took 8.4 sec. Error Message Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) Stacktrace org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) at org.apache.hadoop.ipc.Client.call(Client.java:1159) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:185) at $Proxy15.getAdditionalDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy15.getAdditionalDatanode(Unknown Source) at
[jira] [Updated] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.
[ https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liaowenrui updated HDFS-3175: - Attachment: HDFS-3175.patch modify format When the disk space is available back,Namenode resource monitor can automatically take off safemode. Key: HDFS-3175 URL: https://issues.apache.org/jira/browse/HDFS-3175 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0, 2.0.0 Reporter: liaowenrui Attachments: HDFS-3175.patch, HDFS-3175.patch, HDFS-3175.patch, testcase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245112#comment-13245112 ] Flavio Junqueira commented on HDFS-3092: Hi Suresh, Thanks for sharing a design document. I have a few comments and questions if you don't mind: # I find this design to be very close to bookkeeper, with a few important differences. One noticeable difference that has been mentioned elsewhere is that bookies implement mechanisms to enable high performance when there are multiple concurrent ledgers being written to. Your design does not seem to consider the possibility of multiple concurrent logs, which you may want to have for federation. Federation will be useful for large deployments, but not for small deployments. It sounds like a good idea to have a solution that covers both cases. # There has been comments about comparing the different approaches discussed, and I was wondering what criteria you have been thinking of using to compare them. I guess it can't be performance because as the numbers Ivan has generated show, the current bottleneck is the namenode code, not the logging. Until the existing bottlenecks in the namenode code are removed, having a fast logging mechanism won't make much difference with respect to throughput. # I was wondering about how reads to the log are executed if writes only have to reach a majority quorum. Once it is time to read, how does the reader gets a consistent view of the log? One JD alone may not have all entries, so I suppose the reader may need to read from multiple JDs to get a consistent view? Do the transaction identifiers establish the order of entries in the log? One quick note is that I don't see why a majority is required; bk does not require a majority. Here are some notes I took comparing the bk approach with the one in this jira, in the case you're interested: # *Rolling*: The notion of rolling here is equivalent to closing a ledger and creating a new one. As ledgers are identified with numbers that are monotonically increasing, the ledger identifiers can be used to order the sequence of logs created over time. # *Single writer*: Only one client can add new entries to a ledger. We have the notion of a recovery client, which is essentially a reader that makes sure that there is agreement on the end of the ledger. Such a recovery client is also able to write entries, but these writes are simply to make sure that there is enough replication. # *Fencing*: We fence ledgers individually, so that we guarantee that all operations a ledger writer returns successfully are persisted on enough bookies. This is different from the approach proposed here, which essentially fences logging as a whole. # *Split brain*: In a split-brain situation, bk can have two writers each writing to a different ledger. However, my understanding is that a namenode that is failing over cannot make progress without reading the previous log (ledger), consequently this situation cannot occur with bk and we don’t require writes to a majority for correctness. # *Adding JDs*: The mechanism described here mentions explicitly adding a new JD. My understanding is that a new JD is brought up and it is told somehow to connect to the namenode and to another JD in the JournalList to sync up. bk currently only picks bookies from a pool of available bookies through zookeeper. It shouldn’t be a problem to allow a fixed list of bookies to be passed upon creating a ledger. # *Striping*: bk implements striping, although that’s an optional feature. It is possible to use a configuration like 2-2 or 3-3 (Q-N, Q=quorum size and N=ensemble size). # *Failure detection*: bk uses zookeeper ephemeral nodes to track bookies that are available. A client also changes its ensemble view if it loses a bookie by adding a new bookie. I’m not exactly sure how you monitor crashes here. Is it the namenode that keeps track of which JDs in the JournalList are available? Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to
[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster
[ https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245119#comment-13245119 ] amith commented on HDFS-3179: - Hi Zhanwei Wang I exactly dont know about your test script does, but this look similar to HDFS-3091. can u check this once https://issues.apache.org/jira/browse/HDFS-3091 Please correct me If I am wrong :) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster Key: HDFS-3179 URL: https://issues.apache.org/jira/browse/HDFS-3179 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.2 Reporter: Zhanwei.Wang Priority: Critical Create a single datanode cluster disable permissions enable webhfds start hdfs run the test script expected result: a file named test is created and the content is testtest the result I got: hdfs throw an exception on the second append operation. {code} ./test.sh {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}} {code} Log in datanode: {code} 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /test java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {code} test.sh {code} #!/bin/sh echo test test.txt curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster
[ https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245140#comment-13245140 ] Uma Maheswara Rao G commented on HDFS-3179: --- @Zhanwei, How many DNs are running in your test cluster? failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster Key: HDFS-3179 URL: https://issues.apache.org/jira/browse/HDFS-3179 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.2 Reporter: Zhanwei.Wang Priority: Critical Create a single datanode cluster disable permissions enable webhfds start hdfs run the test script expected result: a file named test is created and the content is testtest the result I got: hdfs throw an exception on the second append operation. {code} ./test.sh {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}} {code} Log in datanode: {code} 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /test java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {code} test.sh {code} #!/bin/sh echo test test.txt curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file
[ https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HDFS-3119: Attachment: HDFS-3119-1.patch Thanks Brandon for being so kind. Thanks Uma, Nicholas and Brandon for patch review and comments. The latest patch fixes Uma, Nicholas and Brandon comments. Also added a test case for the patch. Thanks Uma for your off line help as well :) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file Key: HDFS-3119 URL: https://issues.apache.org/jira/browse/HDFS-3119 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.24.0 Reporter: J.Andreina Assignee: Ashish Singhi Priority: Minor Labels: patch Fix For: 0.24.0, 0.23.2 Attachments: HDFS-3119-1.patch, HDFS-3119.patch cluster setup: -- 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB step1: write a file filewrite.txt of size 90bytes with sync(not closed) step2: change the replication factor to 1 using the command: ./hdfs dfs -setrep 1 /filewrite.txt step3: close the file * At the NN side the file Decreasing replication from 2 to 1 for /filewrite.txt , logs has occured but the overreplicated blocks are not deleted even after the block report is sent from DN * while listing the file in the console using ./hdfs dfs -ls the replication factor for that file is mentioned as 1 * In fsck report for that files displays that the file is replicated to 2 datanodes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file
[ https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HDFS-3119: Labels: patch (was: ) Status: Patch Available (was: Open) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file Key: HDFS-3119 URL: https://issues.apache.org/jira/browse/HDFS-3119 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.24.0 Reporter: J.Andreina Assignee: Ashish Singhi Priority: Minor Labels: patch Fix For: 0.24.0, 0.23.2 Attachments: HDFS-3119-1.patch, HDFS-3119.patch cluster setup: -- 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB step1: write a file filewrite.txt of size 90bytes with sync(not closed) step2: change the replication factor to 1 using the command: ./hdfs dfs -setrep 1 /filewrite.txt step3: close the file * At the NN side the file Decreasing replication from 2 to 1 for /filewrite.txt , logs has occured but the overreplicated blocks are not deleted even after the block report is sent from DN * while listing the file in the console using ./hdfs dfs -ls the replication factor for that file is mentioned as 1 * In fsck report for that files displays that the file is replicated to 2 datanodes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file
[ https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245282#comment-13245282 ] Hadoop QA commented on HDFS-3119: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521139/HDFS-3119-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2168//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2168//console This message is automatically generated. Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file Key: HDFS-3119 URL: https://issues.apache.org/jira/browse/HDFS-3119 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.24.0 Reporter: J.Andreina Assignee: Ashish Singhi Priority: Minor Labels: patch Fix For: 0.24.0, 0.23.2 Attachments: HDFS-3119-1.patch, HDFS-3119.patch cluster setup: -- 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB step1: write a file filewrite.txt of size 90bytes with sync(not closed) step2: change the replication factor to 1 using the command: ./hdfs dfs -setrep 1 /filewrite.txt step3: close the file * At the NN side the file Decreasing replication from 2 to 1 for /filewrite.txt , logs has occured but the overreplicated blocks are not deleted even after the block report is sent from DN * while listing the file in the console using ./hdfs dfs -ls the replication factor for that file is mentioned as 1 * In fsck report for that files displays that the file is replicated to 2 datanodes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3120) Enable hsync and hflush by default
[ https://issues.apache.org/jira/browse/HDFS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245287#comment-13245287 ] Hudson commented on HDFS-3120: -- Integrated in Hadoop-Hdfs-trunk #1004 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/]) Previous commit was for HDFS-3120, fixing up CHANGES.txt (Revision 1308615) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308615 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Enable hsync and hflush by default -- Key: HDFS-3120 URL: https://issues.apache.org/jira/browse/HDFS-3120 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.0 Attachments: hdfs-3120.txt, hdfs-3120.txt The work on branch-20-append was to support *sync*, for durable HBase WALs, not *append*. The branch-20-append implementation is known to be buggy. There's been confusion about this, we often answer queries on the list [like this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to enable correct sync on branch-1 for HBase is to set dfs.support.append to true in your config, which has the side effect of enabling append (which we don't want to do). Let's add a new *dfs.support.sync* option that enables working sync (which is basically the current dfs.support.append flag modulo one place where it's not referring to sync). For compatibility, if dfs.support.append is set, dfs.support.sync will be set as well. This way someone can enable sync for HBase and still keep the current behavior that if dfs.support.append is not set then an append operation will result in an IOE indicating append is not supported. We should do this on trunk as well, as there's no reason to conflate hsync and append with a single config even if append works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3130) Move FSDataset implemenation to a package
[ https://issues.apache.org/jira/browse/HDFS-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245285#comment-13245285 ] Hudson commented on HDFS-3130: -- Integrated in Hadoop-Hdfs-trunk #1004 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/]) HDFS-3130. Move fsdataset implementation to a package. (Revision 1308437) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308437 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetAsyncDiskService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FinalizedReplica.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaAlreadyExistsException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaNotFoundException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaUnderRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaWaitingToBeRecovered.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicasMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java *
[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer
[ https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245289#comment-13245289 ] Hudson commented on HDFS-3148: -- Integrated in Hadoop-Hdfs-trunk #1004 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/]) HDFS-3148. The client should be able to use multiple local interfaces for data transfer. Contributed by Eli Collins (Revision 1308617) HDFS-3148. The client should be able to use multiple local interfaces for data transfer. Contributed by Eli Collins (Revision 1308614) HADOOP-8210. Common side of HDFS-3148: The client should be able to use multiple local interfaces for data transfer. Contributed by Eli Collins (Revision 1308457) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308617 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308614 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/TestFiPipelines.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/permission/TestStickyBit.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/FileAppendTest4.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend3.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend4.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreationDelete.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPipelines.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReadWhileWriting.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRenameWhileOpen.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestDatanodeRestart.java eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308457 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNS.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetUtils.java The client should be able to use multiple local interfaces for data transfer Key: HDFS-3148 URL: https://issues.apache.org/jira/browse/HDFS-3148 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0, 2.0.0 Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt, hdfs-3148.txt, hdfs-3148.txt
[jira] [Commented] (HDFS-3126) Journal stream from the namenode to backup needs to have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245293#comment-13245293 ] Hudson commented on HDFS-3126: -- Integrated in Hadoop-Hdfs-trunk #1004 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/]) HDFS-3126. Journal stream from Namenode to BackupNode needs to have timeout. Contributed by Hari Mankude. (Revision 1308636) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308636 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java Journal stream from the namenode to backup needs to have a timeout -- Key: HDFS-3126 URL: https://issues.apache.org/jira/browse/HDFS-3126 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 0.24.0 Reporter: Hari Mankude Assignee: Hari Mankude Fix For: 0.24.0 Attachments: hdfs-3126.patch, hdfs-3126.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer
[ https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245314#comment-13245314 ] Suresh Srinivas commented on HDFS-3148: --- Hey guys, can you do this work in a separate branch as well. There are too many things going on to catchup on things. I have not had time to look into the proposal and my feeling was, is this complexity worth adding. Though I have not had time to think about how much complexity this feature adds. Also, is Daryn's concern addressed? The client should be able to use multiple local interfaces for data transfer Key: HDFS-3148 URL: https://issues.apache.org/jira/browse/HDFS-3148 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0, 2.0.0 Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt, hdfs-3148.txt, hdfs-3148.txt HDFS-3147 covers using multiple interfaces on the server (Datanode) side. Clients should also be able to utilize multiple *local* interfaces for outbound connections instead of always using the interface for the local hostname. This can be accomplished with a new configuration parameter ({{dfs.client.local.interfaces}}) that accepts a list of interfaces the client should use. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. The client binds its socket to a specific interface, which enables outbound traffic to use that interface. Binding the client socket to a specific address is not sufficient to ensure egress traffic uses that interface. Eg if multiple interfaces are on the same subnet the host requires IP rules that use the source address (which bind sets) to select the destination interface. The SO_BINDTODEVICE socket option could be used to select a specific interface for the connection instead, however it requires JNI (is not in Java's SocketOptions) and root access, which we don't want to require clients have. Like HDFS-3147, the client can use multiple local interfaces for data transfer. Since the client already cache their connections to DNs choosing a local interface at random seems like a good policy. Users can also pin a specific client to a specific interface by specifying just that interface in dfs.client.local.interfaces. This change was discussed in HADOOP-6210 a while back, and is actually useful/independent of the other HDFS-3140 changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3120) Enable hsync and hflush by default
[ https://issues.apache.org/jira/browse/HDFS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245335#comment-13245335 ] Hudson commented on HDFS-3120: -- Integrated in Hadoop-Mapreduce-trunk #1039 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/]) Previous commit was for HDFS-3120, fixing up CHANGES.txt (Revision 1308615) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308615 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Enable hsync and hflush by default -- Key: HDFS-3120 URL: https://issues.apache.org/jira/browse/HDFS-3120 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.0 Attachments: hdfs-3120.txt, hdfs-3120.txt The work on branch-20-append was to support *sync*, for durable HBase WALs, not *append*. The branch-20-append implementation is known to be buggy. There's been confusion about this, we often answer queries on the list [like this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to enable correct sync on branch-1 for HBase is to set dfs.support.append to true in your config, which has the side effect of enabling append (which we don't want to do). Let's add a new *dfs.support.sync* option that enables working sync (which is basically the current dfs.support.append flag modulo one place where it's not referring to sync). For compatibility, if dfs.support.append is set, dfs.support.sync will be set as well. This way someone can enable sync for HBase and still keep the current behavior that if dfs.support.append is not set then an append operation will result in an IOE indicating append is not supported. We should do this on trunk as well, as there's no reason to conflate hsync and append with a single config even if append works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3130) Move FSDataset implemenation to a package
[ https://issues.apache.org/jira/browse/HDFS-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245333#comment-13245333 ] Hudson commented on HDFS-3130: -- Integrated in Hadoop-Mapreduce-trunk #1039 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/]) HDFS-3130. Move fsdataset implementation to a package. (Revision 1308437) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308437 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetAsyncDiskService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FinalizedReplica.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaAlreadyExistsException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaNotFoundException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaUnderRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaWaitingToBeRecovered.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicasMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java *
[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer
[ https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245337#comment-13245337 ] Hudson commented on HDFS-3148: -- Integrated in Hadoop-Mapreduce-trunk #1039 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/]) HDFS-3148. The client should be able to use multiple local interfaces for data transfer. Contributed by Eli Collins (Revision 1308617) HDFS-3148. The client should be able to use multiple local interfaces for data transfer. Contributed by Eli Collins (Revision 1308614) HADOOP-8210. Common side of HDFS-3148: The client should be able to use multiple local interfaces for data transfer. Contributed by Eli Collins (Revision 1308457) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308617 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308614 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/TestFiPipelines.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/permission/TestStickyBit.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/FileAppendTest4.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend3.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend4.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreationDelete.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPipelines.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReadWhileWriting.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRenameWhileOpen.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestDatanodeRestart.java eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308457 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNS.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetUtils.java The client should be able to use multiple local interfaces for data transfer Key: HDFS-3148 URL: https://issues.apache.org/jira/browse/HDFS-3148 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0, 2.0.0 Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt, hdfs-3148.txt,
[jira] [Commented] (HDFS-3126) Journal stream from the namenode to backup needs to have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245341#comment-13245341 ] Hudson commented on HDFS-3126: -- Integrated in Hadoop-Mapreduce-trunk #1039 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/]) HDFS-3126. Journal stream from Namenode to BackupNode needs to have timeout. Contributed by Hari Mankude. (Revision 1308636) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308636 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java Journal stream from the namenode to backup needs to have a timeout -- Key: HDFS-3126 URL: https://issues.apache.org/jira/browse/HDFS-3126 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 0.24.0 Reporter: Hari Mankude Assignee: Hari Mankude Fix For: 0.24.0 Attachments: hdfs-3126.patch, hdfs-3126.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245346#comment-13245346 ] Suresh Srinivas commented on HDFS-3092: --- bq. Your design does not seem to consider the possibility of multiple concurrent logs, which you may want to have for federation. For HDFS editlogs, my feeling is that there will only be three JDs. One on the active namenode, second on the standby and a third JD on one of the machines. In federation, one has to configure a JD per Federated namespace. Alternative is to use BookKeeper, since it could make the deployment simpler for federated large cluster. bq. There has been comments about comparing the different approaches discussed, and I was wondering what criteria you have been thinking of using to compare them. I think the comment was more about comparing the design and complexity of deployment and not benchmarks for two systems. Performance is not the motivation for this jira. bq. I was wondering about how reads to the log are executed if writes only have to reach a majority quorum. Once it is time to read, how does the reader gets a consistent view of the log? One JD alone may not have all entries, so I suppose the reader may need to read from multiple JDs to get a consistent view? Do the transaction identifiers establish the order of entries in the log? One quick note is that I don't see why a majority is required; bk does not require a majority. We decided on majority quorum to keep the design simple, though it is strictly not necessary. A JD in JournalList is supposed to have all the entries and any JD from the list can be used to read the journals. bq. Here are some notes I took comparing the bk approach with the one in this jira, in the case you're interested I noticed that as well. After we went thourgh many issues that this solution had to take care of, the solution looks very similar to BK. That is comforting :-) Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to local edits. To begin with, the existing standby mechanism of reading from a file, will continue to be used, instead of from shared edits, from the local edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer
[ https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245366#comment-13245366 ] Daryn Sharp commented on HDFS-3148: --- Also, is Daryn's concern addressed? I believe so. Part of the confusion was I didn't fully comprehend Eli's earlier responses. Todd made a great point that we need to ensure we have really good documentation for the feature. It's going to require system level configuration to work correctly. The client should be able to use multiple local interfaces for data transfer Key: HDFS-3148 URL: https://issues.apache.org/jira/browse/HDFS-3148 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0, 2.0.0 Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt, hdfs-3148.txt, hdfs-3148.txt HDFS-3147 covers using multiple interfaces on the server (Datanode) side. Clients should also be able to utilize multiple *local* interfaces for outbound connections instead of always using the interface for the local hostname. This can be accomplished with a new configuration parameter ({{dfs.client.local.interfaces}}) that accepts a list of interfaces the client should use. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. The client binds its socket to a specific interface, which enables outbound traffic to use that interface. Binding the client socket to a specific address is not sufficient to ensure egress traffic uses that interface. Eg if multiple interfaces are on the same subnet the host requires IP rules that use the source address (which bind sets) to select the destination interface. The SO_BINDTODEVICE socket option could be used to select a specific interface for the connection instead, however it requires JNI (is not in Java's SocketOptions) and root access, which we don't want to require clients have. Like HDFS-3147, the client can use multiple local interfaces for data transfer. Since the client already cache their connections to DNs choosing a local interface at random seems like a good policy. Users can also pin a specific client to a specific interface by specifying just that interface in dfs.client.local.interfaces. This change was discussed in HADOOP-6210 a while back, and is actually useful/independent of the other HDFS-3140 changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3166) Hftp connections do not have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3166: -- Attachment: HDFS-3166.patch Linux considers the requested listen backlog as advisory... It rounds it up to the next power of 2, with a floor of 16. I modified the test to try up to 32 times to trigger a connect timeout. Hftp connections do not have a timeout -- Key: HDFS-3166 URL: https://issues.apache.org/jira/browse/HDFS-3166 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch Hftp connections do not have read timeouts. This leads to indefinitely hung sockets when there is a network outage during which time the remote host closed the socket. This may also affect WebHdfs, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245475#comment-13245475 ] Flavio Junqueira commented on HDFS-3092: Thanks for the responses, Suresh. bq. For HDFS editlogs, my feeling is that there will only be three JDs. One on the active namenode, second on the standby and a third JD on one of the machines. In federation, one has to configure a JD per Federated namespace. Alternative is to use BookKeeper, since it could make the deployment simpler for federated large cluster. When you say three JDs, that's the degree of replication, right? When I said multiple logs, I was referring to multiple namenodes writing to different logs, as with federation. bq. I think the comment was more about comparing the design and complexity of deployment and not benchmarks for two systems. Performance is not the motivation for this jira. Got it. You're thinking about a qualitative design based on the requirements identified. Correctness sounds like an obvious candidate. :-) bq. We decided on majority quorum to keep the design simple, though it is strictly not necessary. A JD in JournalList is supposed to have all the entries and any JD from the list can be used to read the journals. I think my confusion here is that you require a quorum to be able to acknowledge the operation, but in reality you try to write to everyone. If you can't write to everyone, then you induce a view change (change to JournalList). Is this right? Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to local edits. To begin with, the existing standby mechanism of reading from a file, will continue to be used, instead of from shared edits, from the local edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster
[ https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245476#comment-13245476 ] Zhanwei.Wang commented on HDFS-3179: @Uma and amith It seems the same question with HDFS-3091. I configure only one datanode and create a file using default number of replica(3), existings(1) = replication/2(3/2==1) will be satisfied and it can not replace with the new node as there is no extra nodes exist in the cluster. HDFS-3091 should patch to 0.23.2 branch failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster Key: HDFS-3179 URL: https://issues.apache.org/jira/browse/HDFS-3179 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.2 Reporter: Zhanwei.Wang Priority: Critical Create a single datanode cluster disable permissions enable webhfds start hdfs run the test script expected result: a file named test is created and the content is testtest the result I got: hdfs throw an exception on the second append operation. {code} ./test.sh {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}} {code} Log in datanode: {code} 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /test java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {code} test.sh {code} #!/bin/sh echo test test.txt curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster
[ https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245501#comment-13245501 ] Zhanwei.Wang commented on HDFS-3179: @Uma and amith Another question, in this test script, I first create a new EMPTY file and append to the file twice. The first append succeed because file is empty, to create a pipeline, the stage is PIPELINE_SETUP_CREATE and the policy will not be checked. The second append failed because the stage is PIPELINE_SETPU_APPEND and the policy will be checked. So from the view of user, the first append succeed while the second fail, is that a good idea? {code} // get new block from namenode if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) { if(DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug(Allocating new block); } nodes = nextBlockOutputStream(src); initDataStreaming(); } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND) { if(DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug(Append to block + block); } setupPipelineForAppendOrRecovery(); //check the policy here initDataStreaming(); } {code} failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster Key: HDFS-3179 URL: https://issues.apache.org/jira/browse/HDFS-3179 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.2 Reporter: Zhanwei.Wang Priority: Critical Create a single datanode cluster disable permissions enable webhfds start hdfs run the test script expected result: a file named test is created and the content is testtest the result I got: hdfs throw an exception on the second append operation. {code} ./test.sh {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}} {code} Log in datanode: {code} 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /test java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {code} test.sh {code} #!/bin/sh echo test test.txt curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3000) Add a public API for setting quotas
[ https://issues.apache.org/jira/browse/HDFS-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245506#comment-13245506 ] Daryn Sharp commented on HDFS-3000: --- +1 Although I'd suggest maybe adding a ctor that takes a filesystem instance. It user may want to use a custom configured filesystem, or avoid creation of another fs instance if the fs cache is disabled. Add a public API for setting quotas --- Key: HDFS-3000 URL: https://issues.apache.org/jira/browse/HDFS-3000 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3000.patch, HDFS-3000.patch, HDFS-3000.patch, HDFS-3000.patch Currently one can set the quota of a file or directory from the command line, but if a user wants to set it programmatically, they need to use DistributedFileSystem, which is annotated InterfaceAudience.Private. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245515#comment-13245515 ] Zhanwei.Wang commented on HDFS-3091: Hi, Nocholas {quote} I would say the failures are expected. The feature is to guarantee the number of replicas that the user is asking. However, the cluster is too small that the guarantee is impossible. It makes sense to fail the write requests. {quote} I agree with you, but have a look at code. in HDFS-3179, I first create a EMPTY file and append twice, the first append finished successfully but the second failed since there is only one datanode and the number of replica is 3. Is that what you want to see? I think the policy check should fail on the first write to the file. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 2.0.0 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245523#comment-13245523 ] Suresh Srinivas commented on HDFS-3092: --- bq. When you say three JDs, that's the degree of replication, right? When I said multiple logs, I was referring to multiple namenodes writing to different logs, as with federation. Right, three JDs for degree of replication. However, I do understand multiple logs - that is a log per namespace. For every namespace, in federation, active and standby namenode + additional JD is needed. bq. I think my confusion here is that you require a quorum to be able to acknowledge the operation, but in reality you try to write to everyone. If you can't write to everyone, then you induce a view change (change to JournalList). Is this right? Yes. In the first cut we write to all the JDs that are active. At least quorum should be written. This can be improved in the future by waiting for only Quorum JDs. Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to local edits. To begin with, the existing standby mechanism of reading from a file, will continue to be used, instead of from shared edits, from the local edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3183) Add JournalManager implementation to use local namenode, remote namenode and a configured JournalDaemon for storing editlogs
Add JournalManager implementation to use local namenode, remote namenode and a configured JournalDaemon for storing editlogs Key: HDFS-3183 URL: https://issues.apache.org/jira/browse/HDFS-3183 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Suresh Srinivas The JournalManager is used in HA configuration and uses the following journal targets: - local namenode - Other namenode - A configured JournalDaemon target from configuration -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245538#comment-13245538 ] Hadoop QA commented on HDFS-3166: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521164/HDFS-3166.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2169//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2169//console This message is automatically generated. Hftp connections do not have a timeout -- Key: HDFS-3166 URL: https://issues.apache.org/jira/browse/HDFS-3166 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch Hftp connections do not have read timeouts. This leads to indefinitely hung sockets when there is a network outage during which time the remote host closed the socket. This may also affect WebHdfs, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3043) HA: ZK-based client proxy provider
[ https://issues.apache.org/jira/browse/HDFS-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3043: -- Target Version/s: Auto failover (HDFS-3042) HA: ZK-based client proxy provider -- Key: HDFS-3043 URL: https://issues.apache.org/jira/browse/HDFS-3043 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, hdfs client Reporter: Todd Lipcon Assignee: Aaron T. Myers When HDFS-2185 is implemented, ZooKeeper can be used to locate the active NameNode. We can use this from the DFS client in order to connect to the correct NN without having to configure a list of possibly-active NNs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3065) HA: Newly active NameNode does not recognize decommissioning DataNode
[ https://issues.apache.org/jira/browse/HDFS-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3065: -- Priority: Minor (was: Major) HA: Newly active NameNode does not recognize decommissioning DataNode - Key: HDFS-3065 URL: https://issues.apache.org/jira/browse/HDFS-3065 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Stephen Chu Priority: Minor I'm working on a cluster where, originally, styx01 hosts the active NameNode and styx02 hosts the standby NameNode. In both styx01's and styx02's exclude file, I added the DataNode on styx03.I then ran _hdfs dfsadmin -refreshNodes_ and verified on styx01 NN web UI that the DN on styx03 was decommissioning. After waiting a few minutes, I checked the standby NN web UI (while the DN was decommissioning) and didn't see that the DN was marked as decommissioning. I executed manual failover, making styx02 NN active and styx01 NN standby. I checked the newly active NN web UI, and the DN was still not marked as decommissioning, even after a few minutes. However, the newly standby NN's web UI still showed the DN as decommissioning. I added another DN to the exclude file, and executed _hdfs dfsadmin -refreshNodes_, but the styx02 NN web UI still did not update with the decommissioning nodes. I failed back over to make styx01 NN active and styx02 NN standby. I checked the styx01 NN web UI and saw that it correctly marked 2 DNs as decommissioning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than CRC chunk size
[ https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3181: - Description: org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart seems to be failing intermittently on jenkins. {code} org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart Failing for the past 1 build (Since Failed#2163 ) Took 8.4 sec. Error Message Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) Stacktrace org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) ... at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy15.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:317) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:828) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {code} was: org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart seems to be failing intermittently on jenkins. {code} org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart Failing for the past 1 build (Since Failed#2163 ) Took 8.4 sec. Error Message Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245578#comment-13245578 ] Hari Mankude commented on HDFS-3077: Todd, The doc is excellent. Had a comment on a potential issue which could result due to epochnumber with certain failure scenarios. Specifically, I am talking about the scenario in section 2.5.6 J1 is at txid 153, J2 is at txid 150 and J3 is at txid 125. Epochnumber on all the journals is 1. Both NN1 and NN2 are trying to become_active() at the same time. NN1 talks to J1, J2 and sets the proposedEpoch to 2. NN2 talks to J2 and J3 and decides to set the proposedEpoch to 2. NN1 succeeds in setting newEpoch to 2 on J1 and fails on J2 and J3. NN1 dies since it does not have quorum. NN2 succeeds in setting newEpoch to 2 on J2 and J3 and has the quorum. NN2 cannot talk to J1. Similar to the scenario in 2.5.6, NN2 writes 151, 152,153 into J2 and J3 and then dies. So currently, state is epoch number is 2 on all the journals and J1, J2 and J3 are at 153. We have a problem since it is not possible to distinguish between log entries in J1 vs J2 and J3. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than CRC chunk size
[ https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245582#comment-13245582 ] Tsz Wo (Nicholas), SZE commented on HDFS-3181: -- Hi Todd, thanks for clarifying it. I can reproduce the failure now. testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than CRC chunk size - Key: HDFS-3181 URL: https://issues.apache.org/jira/browse/HDFS-3181 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Colin Patrick McCabe Priority: Critical Attachments: TestLeaseRecovery2with1535.patch, repro.txt, testOut.txt org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart seems to be failing intermittently on jenkins. {code} org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart Failing for the past 1 build (Since Failed#2163 ) Took 8.4 sec. Error Message Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) Stacktrace org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) ... at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy15.getAdditionalDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:317) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:828) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3153) For HA, a logical name is visible in URIs - add an explicit logical name
[ https://issues.apache.org/jira/browse/HDFS-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3153: -- Issue Type: Improvement (was: Sub-task) Parent: (was: HDFS-1623) For HA, a logical name is visible in URIs - add an explicit logical name Key: HDFS-3153 URL: https://issues.apache.org/jira/browse/HDFS-3153 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.2.patch Patch addressing Todd's concerns. I added a 'flags' field to hdfsFile that has a bit set if a direct read is supported. I detect that by trying to issue a 0-byte read when the file is created. If an exception is thrown, the flag is cleared, otherwise it is set. Once the flag is set, all subsequent hdfsRead calls will be diverted to hdfsReadDirect. An alternative is to use reflection to grab the input stream inside FsDataInputStream and use reflection to look for ByteBufferReadable, but that feels a little fragile (and complex to do in C); plus if some FS implements read(ByteBuffer) only to stub it out with a UnsupportedOperationException or similar, reads would never work correctly. libhdfs implementation of direct read API - Key: HDFS-3110 URL: https://issues.apache.org/jira/browse/HDFS-3110 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs Reporter: Henry Robinson Assignee: Henry Robinson Fix For: 0.24.0 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, which leads to significant performance increases when reading local data from C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245588#comment-13245588 ] Suresh Srinivas commented on HDFS-3077: --- Thanks for posting the design. Now I understand your comment that there is a lot of common things between this one and the approach in HDFS-3092. Here are some high level comments: # Terminology - JournalDaemon or JournalNode. I prefer JournalDaemon because my plan was to run them in the same process space as the namenode. A JournalDeamon could also be stand-alone process. # I like the idea of quorum writes and maintaining the queue. 3092 design currently uses timeout to declare a JD slow and fail it. We were planning to punting on it until we had first implementation. # newEpoch() is called fence() in HDFS-3092. My preference is to use the name fence(). I was using version # which is called epoch. I think the name epoch sounds better. The key difference is that version # is generated from znode in HDFS-3092. So two namenodes cannot use the same epoch number. I think there is a bug with the approach you have described, stemming from the fact that two namenodes can use the same epoch and step 3 in 2.4 can be completed independent of quorum. This is shown in Hari's example. # I prefer to record epoch in startLogSegment filler record. startLogSegment record was never part of the journal, which we had added for structural reasons. So adding epoch info to it should not matter. The way I see it is - journal belongs to a segment. Segment has single version # or epoch. # In both proposals epoch or version # needs to be sent in all journal requests. We could certainly make a list of common work items and create jiras, so that many people can collaborate and wrap it up, like we did in HDFS-1623. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245602#comment-13245602 ] Hadoop QA commented on HDFS-3110: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521180/HDFS-3110.2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2170//console This message is automatically generated. libhdfs implementation of direct read API - Key: HDFS-3110 URL: https://issues.apache.org/jira/browse/HDFS-3110 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs Reporter: Henry Robinson Assignee: Henry Robinson Fix For: 0.24.0 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, which leads to significant performance increases when reading local data from C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster
[ https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245601#comment-13245601 ] Tsz Wo (Nicholas), SZE commented on HDFS-3179: -- I think the problem is one datanode with replication 3. What should be the user expectation? It seems that users won't be happy if we do not allow append. However, if we allow appending to a single replica and the replica become corrupted, then it is possible to have data loss - I can imagine in some extreme cases that a user is appending to a single replica slowly, admin add more datanodes later on but the block won't be replicated since the file is not closed, and then the datanode with the single replica fails. Is this case acceptable to you? So from the view of user, the first append succeed while the second fail, is that a good idea? The distinction is whether there is pre-append data. There are pre-append data in the replica in the second append. The pre-append data was in a closed file and if the datanode fails during append, it could have data loss. However, in the first append, there is no pre-append data. If the append fails and the new replica is lost, it is a sort of okay since only the new data is lost. The add-datanode feature of is to prevent data loss on pre-append data. Users (or admin) could turn it off as mentioned in HDFS-3091. I think we may improve the error message. Is it good enough? Or any suggestion? failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster Key: HDFS-3179 URL: https://issues.apache.org/jira/browse/HDFS-3179 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.2 Reporter: Zhanwei.Wang Priority: Critical Create a single datanode cluster disable permissions enable webhfds start hdfs run the test script expected result: a file named test is created and the content is testtest the result I got: hdfs throw an exception on the second append operation. {code} ./test.sh {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}} {code} Log in datanode: {code} 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /test java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {code} test.sh {code} #!/bin/sh echo test test.txt curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3184) Add public HDFS client API
Add public HDFS client API -- Key: HDFS-3184 URL: https://issues.apache.org/jira/browse/HDFS-3184 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE There are some useful operations in HDFS but not in the FileSystem API; see a list in [Uma's comment|https://issues.apache.org/jira/browse/HDFS-1599?focusedCommentId=13243105page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13243105]. These operations should be made available to the public. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1599) Umbrella Jira for Improving HBASE support in HDFS
[ https://issues.apache.org/jira/browse/HDFS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245616#comment-13245616 ] Tsz Wo (Nicholas), SZE commented on HDFS-1599: -- Most of the reflection in HBase has to do with version compatibility, not accessing private APIs. Adding a new API on HDFS doesn't solve the problem, really, since the whole reason for the reflection is to compile against old versions which don't have the new APIs It does not solve the problem today but it will solve the problem in the future. :) Umbrella Jira for Improving HBASE support in HDFS - Key: HDFS-1599 URL: https://issues.apache.org/jira/browse/HDFS-1599 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Umbrella Jira for improved HBase support in HDFS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1599) Umbrella Jira for Improving HBASE support in HDFS
[ https://issues.apache.org/jira/browse/HDFS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245617#comment-13245617 ] Tsz Wo (Nicholas), SZE commented on HDFS-1599: -- Uma, thanks for listing them out. I have created HDFS-3184 for adding new HDFS client APIs. Umbrella Jira for Improving HBASE support in HDFS - Key: HDFS-1599 URL: https://issues.apache.org/jira/browse/HDFS-1599 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Umbrella Jira for improved HBase support in HDFS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3184) Add public HDFS client API
[ https://issues.apache.org/jira/browse/HDFS-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245629#comment-13245629 ] Uma Maheswara Rao G commented on HDFS-3184: --- Great, Thanks a lot Nicholas for filing the JIRA. :-) Add public HDFS client API -- Key: HDFS-3184 URL: https://issues.apache.org/jira/browse/HDFS-3184 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE There are some useful operations in HDFS but not in the FileSystem API; see a list in [Uma's comment|https://issues.apache.org/jira/browse/HDFS-1599?focusedCommentId=13243105page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13243105]. These operations should be made available to the public. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3166) Hftp connections do not have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3166: - Resolution: Fixed Fix Version/s: 3.0.0 2.0.0 0.23.3 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 the patch looks good. The failed test is not related. I have committed this. Thanks, Daryn! Hftp connections do not have a timeout -- Key: HDFS-3166 URL: https://issues.apache.org/jira/browse/HDFS-3166 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 0.23.3, 2.0.0, 3.0.0 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch Hftp connections do not have read timeouts. This leads to indefinitely hung sockets when there is a network outage during which time the remote host closed the socket. This may also affect WebHdfs, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245656#comment-13245656 ] Todd Lipcon commented on HDFS-3077: --- bq. So currently, state is epoch number is 2 on all the journals and J1, J2 and J3 are at 153. We have a problem since it is not possible to distinguish between log entries in J1 vs J2 and J3. Hey Hari. Thanks for taking a look in such good detail. I think the doc is currently unclear about the proposed solution described in 2.5.6 -- the idea is not to use just the lastPromisedEpoch here to distinguish the JNs, but rather to attach the epoch number to each log segment, based on the epoch in which that segment was started. So, even though in your scenario NN1 sets J1.lastPromisedEpoch=2, the log segment will retain e=1. Once a segment's epoch is set, it is never changed (unless the segment is removed by a synchronization) Does that make sense? If so I will try to clarify the document. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
[ https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3176: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I have committed this. Thanks, Kihwal! JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own. --- Key: HDFS-3176 URL: https://issues.apache.org/jira/browse/HDFS-3176 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0, 1.0.1 Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead call MD5MD5CRC32FileChecksum.readFields(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245664#comment-13245664 ] Hudson commented on HDFS-3166: -- Integrated in Hadoop-Hdfs-trunk-Commit #2058 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2058/]) HDFS-3166. Add timeout to Hftp connections. Contributed by Daryn Sharp (Revision 1309103) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309103 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HsftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLUtils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpURLTimeouts.java Hftp connections do not have a timeout -- Key: HDFS-3166 URL: https://issues.apache.org/jira/browse/HDFS-3166 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 0.23.3, 2.0.0, 3.0.0 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch Hftp connections do not have read timeouts. This leads to indefinitely hung sockets when there is a network outage during which time the remote host closed the socket. This may also affect WebHdfs, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
[ https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245663#comment-13245663 ] Hudson commented on HDFS-3176: -- Integrated in Hadoop-Hdfs-trunk-Commit #2058 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2058/]) HDFS-3176. Use MD5MD5CRC32FileChecksum.readFields() in JsonUtil . Contributed by Kihwal Lee (Revision 1309114) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309114 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own. --- Key: HDFS-3176 URL: https://issues.apache.org/jira/browse/HDFS-3176 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0, 1.0.1 Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead call MD5MD5CRC32FileChecksum.readFields(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245668#comment-13245668 ] Hudson commented on HDFS-3166: -- Integrated in Hadoop-Common-trunk-Commit #1983 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1983/]) HDFS-3166. Add timeout to Hftp connections. Contributed by Daryn Sharp (Revision 1309103) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309103 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HsftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLUtils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpURLTimeouts.java Hftp connections do not have a timeout -- Key: HDFS-3166 URL: https://issues.apache.org/jira/browse/HDFS-3166 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 0.23.3, 2.0.0, 3.0.0 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch Hftp connections do not have read timeouts. This leads to indefinitely hung sockets when there is a network outage during which time the remote host closed the socket. This may also affect WebHdfs, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
[ https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245667#comment-13245667 ] Hudson commented on HDFS-3176: -- Integrated in Hadoop-Common-trunk-Commit #1983 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1983/]) HDFS-3176. Use MD5MD5CRC32FileChecksum.readFields() in JsonUtil . Contributed by Kihwal Lee (Revision 1309114) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309114 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own. --- Key: HDFS-3176 URL: https://issues.apache.org/jira/browse/HDFS-3176 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0, 1.0.1 Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead call MD5MD5CRC32FileChecksum.readFields(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
[ https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245681#comment-13245681 ] Hudson commented on HDFS-3176: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1996/]) HDFS-3176. Use MD5MD5CRC32FileChecksum.readFields() in JsonUtil . Contributed by Kihwal Lee (Revision 1309114) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309114 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own. --- Key: HDFS-3176 URL: https://issues.apache.org/jira/browse/HDFS-3176 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0, 1.0.1 Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead call MD5MD5CRC32FileChecksum.readFields(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout
[ https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245682#comment-13245682 ] Hudson commented on HDFS-3166: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1996/]) HDFS-3166. Add timeout to Hftp connections. Contributed by Daryn Sharp (Revision 1309103) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309103 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HsftpFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLUtils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpURLTimeouts.java Hftp connections do not have a timeout -- Key: HDFS-3166 URL: https://issues.apache.org/jira/browse/HDFS-3166 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 0.23.3, 2.0.0, 3.0.0 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch Hftp connections do not have read timeouts. This leads to indefinitely hung sockets when there is a network outage during which time the remote host closed the socket. This may also affect WebHdfs, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer
[ https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245686#comment-13245686 ] Eli Collins commented on HDFS-3148: --- Hey Suresh, This feature is actually independent of all the other hdfs-3140 sub-tasks, and multihoming in general, and therefore does not require any further jiras. It covers using multiple interfaces on the *client* side, the others are all about using multiple jiras on the *server* side. These can both be used independently, eg it's just as valuable to use multiple local interfaces on the client side even if you don't use multihoming on the server side. Happy to pull it out to it's own top-level jira if that's more clear. Ditto, lemme know if you think the other HDFS-3140 jiras should be in a branch. Just enabling multihoming requires HDFS-3146 and HDFS-3147 and a branch for a couple jiras felt like overkill. Much of the work has been in the cleanup of DatanodeID and friends. Thanks, Eli The client should be able to use multiple local interfaces for data transfer Key: HDFS-3148 URL: https://issues.apache.org/jira/browse/HDFS-3148 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0, 2.0.0 Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt, hdfs-3148.txt, hdfs-3148.txt HDFS-3147 covers using multiple interfaces on the server (Datanode) side. Clients should also be able to utilize multiple *local* interfaces for outbound connections instead of always using the interface for the local hostname. This can be accomplished with a new configuration parameter ({{dfs.client.local.interfaces}}) that accepts a list of interfaces the client should use. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. The client binds its socket to a specific interface, which enables outbound traffic to use that interface. Binding the client socket to a specific address is not sufficient to ensure egress traffic uses that interface. Eg if multiple interfaces are on the same subnet the host requires IP rules that use the source address (which bind sets) to select the destination interface. The SO_BINDTODEVICE socket option could be used to select a specific interface for the connection instead, however it requires JNI (is not in Java's SocketOptions) and root access, which we don't want to require clients have. Like HDFS-3147, the client can use multiple local interfaces for data transfer. Since the client already cache their connections to DNs choosing a local interface at random seems like a good policy. Users can also pin a specific client to a specific interface by specifying just that interface in dfs.client.local.interfaces. This change was discussed in HADOOP-6210 a while back, and is actually useful/independent of the other HDFS-3140 changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245691#comment-13245691 ] Todd Lipcon commented on HDFS-3077: --- bq. Terminology - JournalDaemon or JournalNode. I prefer JournalDaemon because my plan was to run them in the same process space as the namenode. A JournalDeamon could also be stand-alone process. I prefer JournalNode because every other daemon we have is a *Node. If you're running it inside another process, I think we would just call it a JournalService -- or an embedded JournalNode. I think of a daemon as a standalone process. bq. I like the idea of quorum writes and maintaining the queue. 3092 design currently uses timeout to declare a JD slow and fail it. We were planning to punting on it until we had first implementation. OK. This part I have done in the patch attached here and works pretty well, so far. If you want, I'm happy to separate out the quorum completion code to commit it ASAP so we can share code here. bq. newEpoch() is called fence() in HDFS-3092. My preference is to use the name fence(). I was using version # which is called epoch. I think the name epoch sounds better. The key difference is that version # is generated from znode in HDFS-3092. As I had commented earlier on this ticket, I originally was planning to do something similar to you, bootstrapping off of ZK to generate epoch numbers. But then, when I got into coding, I realized that this algorithm is actually not so hard to implement, and adding a dependency on ZK actually adds to the combinatorics of things to think about. I think the standalone nature of the approach outweighs what benefit we might get by reusing ZK. bq. So two namenodes cannot use the same epoch number. I think there is a bug with the approach you have described, stemming from the fact that two namenodes can use the same epoch and step 3 in 2.4 can be completed independent of quorum. This is shown in Hari's example. How can step 3 in section 2.4 be completed independent of quorum? Step 4 indicates that it requires a quorum of nodes to respond successfully to the {{newEpoch}} message. Here's an example: Initial state: ||Node||lastPromisedEpoch|| |JN1|1| |JN2|1| |JN3|1| 1. Two NNs (NN1 and NN2) enter step 1 concurrently. They both receive responses indicating {{lastPromisedEpoch==1}} from all of the JNs. 2. They both propose {{newEpoch(2)}}. The behavior of the JN ensures that it will only respond success to either NN1 or NN2, but not both (since it will fail if the proposedEpoch = lastPromisedEpoch) So, either NN1 or NN2 gets success from a majority. The other node will only get success from a minority, and thus will abort. Note that with message losses or failures, it's possible for _neither_ of the nodes to get a quorum in the case of a race. That's OK, since we expect that an external leader election framework will eventually assist such that only one NN is trying to become active, and then that NN will win. Note that the epoch algorithm is cribbed from ZAB, see page 7 of Yahoo tech report YL-2010-0007. The mapping from ZAB terminology is: ||ZAB term||QJournal term|| |CEPOCH(e)|Response to getLastPromisedEpoch()| |NEWEPOCH(e')|newEpoch(proposedEpoch)| |ACK-E(...)|success response to newEpoch()| Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1
[ https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3055: --- Attachment: HDFS-3055-b1.002.patch * add unit test * some fixes to NN unclean shutdown (to allow unit test to work) * better error reporting for the branch-1 edit log stuff (print out the offset when we encounter a problem) Implement recovery mode for branch-1 Key: HDFS-3055 URL: https://issues.apache.org/jira/browse/HDFS-3055 Project: Hadoop HDFS Issue Type: New Feature Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 1.0.0 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch Implement recovery mode for branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3185) Setup configuration for Journal Manager and Journal Services
Setup configuration for Journal Manager and Journal Services Key: HDFS-3185 URL: https://issues.apache.org/jira/browse/HDFS-3185 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hari Mankude Assignee: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3186) Sync lagging journal service from the active journal service
Sync lagging journal service from the active journal service Key: HDFS-3186 URL: https://issues.apache.org/jira/browse/HDFS-3186 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3187: -- Attachment: hdfs-3187.txt Attached patch upgrades guava in the pom, and also fixes two calls to methods that have been removed in this version of guava. Unfortunately the QA bot won't be able to run this patch since it changes the top-level pom. Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3186) Sync lagging journal service from the active journal service
[ https://issues.apache.org/jira/browse/HDFS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Mankude reassigned HDFS-3186: -- Assignee: Hari Mankude Sync lagging journal service from the active journal service Key: HDFS-3186 URL: https://issues.apache.org/jira/browse/HDFS-3186 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Hari Mankude Assignee: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3187) Upgrade guava to 11.0.2
Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3187: -- Status: Patch Available (was: Open) Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond
Add infrastructure for waiting for a quorum of ListenableFutures to respond --- Key: HDFS-3188 URL: https://issues.apache.org/jira/browse/HDFS-3188 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As described in the design document, this class allows a set of ListenableFutures to be wrapped, and the caller can wait for a specific number of responses, or a timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245721#comment-13245721 ] Hadoop QA commented on HDFS-3187: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521212/hdfs-3187.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2171//console This message is automatically generated. Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond
[ https://issues.apache.org/jira/browse/HDFS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3188: -- Attachment: hdfs-3188.txt Attached patch implements QuorumCall as described, and includes a unit test. Add infrastructure for waiting for a quorum of ListenableFutures to respond --- Key: HDFS-3188 URL: https://issues.apache.org/jira/browse/HDFS-3188 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3188.txt This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As described in the design document, this class allows a set of ListenableFutures to be wrapped, and the caller can wait for a specific number of responses, or a timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3189) Add preliminary QJournalProtocol interface, translators
Add preliminary QJournalProtocol interface, translators --- Key: HDFS-3189 URL: https://issues.apache.org/jira/browse/HDFS-3189 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon This JIRA is to add the preliminary code for the QJournalProtocol. This protocol differs from JournalProtocol in the following ways: - each call has context information indicating the epoch number of the requester - it contains calls that are specific to epoch number generation, etc, which do not apply to other journaling daemons such as the BackupNode My guess is that, at some point, we can merge back down to one protocol, but during the initial implementation phase, it will be useful to have a distinct protocol for this project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond
[ https://issues.apache.org/jira/browse/HDFS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3188: -- Status: Patch Available (was: Open) Add infrastructure for waiting for a quorum of ListenableFutures to respond --- Key: HDFS-3188 URL: https://issues.apache.org/jira/browse/HDFS-3188 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3188.txt This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As described in the design document, this class allows a set of ListenableFutures to be wrapped, and the caller can wait for a specific number of responses, or a timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245729#comment-13245729 ] Bikas Saha commented on HDFS-3077: -- Nice doc! Greatly sped up understanding the design instead of having to grok it from the patch :) I think it will help clarify the doc, if you add the explanation for Hari's example. Even though epoch 2 is persisted on JN1, its last log segment is still tied to epoch 1 and it needs to sync its last log segment with JN2/JN3. Are you proposing that JN1 drop its last edits in progress and pick up the corresponding finalized segment from JN1/JN2. Or is it TBD? Btw, there is some new code here but there seems to be some code in existing NN that changes the sequential journal sync to parallel (based on reading your doc and not your patch). I am guessing there will be other significant changes going forward. Are you planning on committing this to a branch or directly to trunk? Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3189) Add preliminary QJournalProtocol interface, translators
[ https://issues.apache.org/jira/browse/HDFS-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3189: -- Attachment: hdfs-3189-prelim.txt Preliminary patch, there are still a couple TODOs/cleanup to do before this is committable. Add preliminary QJournalProtocol interface, translators --- Key: HDFS-3189 URL: https://issues.apache.org/jira/browse/HDFS-3189 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3189-prelim.txt This JIRA is to add the preliminary code for the QJournalProtocol. This protocol differs from JournalProtocol in the following ways: - each call has context information indicating the epoch number of the requester - it contains calls that are specific to epoch number generation, etc, which do not apply to other journaling daemons such as the BackupNode My guess is that, at some point, we can merge back down to one protocol, but during the initial implementation phase, it will be useful to have a distinct protocol for this project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0 Reporter: Todd Lipcon Priority: Minor This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245736#comment-13245736 ] Todd Lipcon commented on HDFS-3077: --- bq. I think it will help clarify the doc, if you add the explanation for Hari's example. Even though epoch 2 is persisted on JN1, its last log segment is still tied to epoch 1 and it needs to sync its last log segment with JN2/JN3. Are you proposing that JN1 drop its last edits in progress and pick up the corresponding finalized segment from JN1/JN2. Or is it TBD? Yes, I think it would see that its copy of the segment is out of date epoch-wise, delete it, and then copy the finalized segments from the other nodes later. I'll try to expand upon this portion of the doc in the coming days. I also have another idea which may be slightly simpler -- Suresh got me thinking about it a bit. Basically the idea is that, instead of deleting empty edit logs, we could fill them in with a single NOOP transaction. Let me think on this for a little while and then update the design doc if it turns out to work. bq. Btw, there is some new code here but there seems to be some code in existing NN that changes the sequential journal sync to parallel (based on reading your doc and not your patch). Nope, the thinking is that all of the new code will be encapsulated by QuorumJournalManager. So, from the NN's perspective, there is only a single edit log. It happens that that edit log is distributed and fault-tolerant underneath, but the NN would see it as a single required journal, and crash if it fails to sync. bq. Are you planning on committing this to a branch or directly to trunk? I'm happy to do either. Suresh seemed to think doing it on a branch would be counter-productive to code sharing. In practice it's almost new code, so as long as we're clear to mark it in-progress or experimental, I don't think it would be destabilizing to do in trunk. HDFS-3190 is the one place in which I've modified NN code, but only trivially. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3190: -- Attachment: hdfs-3190.txt Simple patch implements the above. Does not add unit tests since it's a straight refactor of existing code, and that code is covered by many existing tests. Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0 Reporter: Todd Lipcon Priority: Minor Attachments: hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245739#comment-13245739 ] Bikas Saha commented on HDFS-2185: -- I think you are missing the failure arc when transitionToStandby is called in InElection state. Is there any scope for admin operations in ZKFC. Will ZKFC receive and accept a signal (manual admin/auto machine reboot) to stop services? At that point, in InElection state, how will it know that it needs to send transitionToStandby or not (based on whether it is active or not)? HA: HDFS portion of ZK-based FailoverController --- Key: HDFS-2185 URL: https://issues.apache.org/jira/browse/HDFS-2185 Project: Hadoop HDFS Issue Type: Sub-task Components: auto-failover, ha Affects Versions: 0.24.0, 0.23.3 Reporter: Eli Collins Assignee: Todd Lipcon Fix For: Auto failover (HDFS-3042) Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.tex This jira is for a ZK-based FailoverController daemon. The FailoverController is a separate daemon from the NN that does the following: * Initiates leader election (via ZK) when necessary * Performs health monitoring (aka failure detection) * Performs fail-over (standby to active and active to standby transitions) * Heartbeats to ensure the liveness It should have the same/similar interface as the Linux HA RM to aid pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3190: -- Assignee: Todd Lipcon Status: Patch Available (was: Open) Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245752#comment-13245752 ] Bikas Saha commented on HDFS-3077: -- bq. Nope, the thinking is that all of the new code will be encapsulated by QuorumJournalManager. So, from the NN's perspective, there is only a single edit log. It happens that that edit log is distributed and fault-tolerant underneath, but the NN would see it as a single required journal, and crash if it fails to sync. Got it. So local edits and remote edits would be replaced by a single qjournaledits. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster
[ https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245755#comment-13245755 ] Zhanwei.Wang commented on HDFS-3179: I totally agree with you about the problem one datanode with replication 3,I think this kind of operation should fail or at least get a warning. My opinion is that, the purpose of the policy check is to make sure no potential data lose, in this one datanode 3 replica case, although the first append failure will not cause the data lose, the appended data after the first successful append is in danger because there is only one replica which is not the user expected 3. And there is no warning to tell the user the truth. My suggestion is to make the first write to the empty file fail if there is not enough datanode, in another word, make the policy check more strictly. And make the error message more friendly instead of nodes.length != original.length + 1. failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster Key: HDFS-3179 URL: https://issues.apache.org/jira/browse/HDFS-3179 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.2 Reporter: Zhanwei.Wang Priority: Critical Create a single datanode cluster disable permissions enable webhfds start hdfs run the test script expected result: a file named test is created and the content is testtest the result I got: hdfs throw an exception on the second append operation. {code} ./test.sh {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}} {code} Log in datanode: {code} 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /test java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {code} test.sh {code} #!/bin/sh echo test test.txt curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245765#comment-13245765 ] Bikas Saha commented on HDFS-3190: -- +1 lgtm. Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245771#comment-13245771 ] Suresh Srinivas commented on HDFS-3077: --- bq. Suresh seemed to think doing it on a branch would be counter-productive to code sharing There is a branch already created for 3092. We could use that. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245774#comment-13245774 ] Todd Lipcon commented on HDFS-3110: --- my top comment got chopped somehow above: - I like the refactoring out of readPrepare and handleReadResult. But, these should be declared {{static}} libhdfs implementation of direct read API - Key: HDFS-3110 URL: https://issues.apache.org/jira/browse/HDFS-3110 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs Reporter: Henry Robinson Assignee: Henry Robinson Fix For: 0.24.0 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, which leads to significant performance increases when reading local data from C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245773#comment-13245773 ] Todd Lipcon commented on HDFS-3110: --- uld be declared {{static}} - I think your new patch was actually a delta vs the old patch, instead of a completely new one vs trunk. We need a new one for QA commit - When NewDirectByteBuffer returns NULL with no errno set, I think it's better to set {{errno = ENOMEM;}} in an {{else}} clause -- just a little easier to read. - The new flag HDFS_SUPPORTS_DIRECT_READ is only used internally, so not sure it belongs in the public header hdfs.h (this is what users include, right?). Also, I think it would be better named something like {{HDFS_FILE_SUPPORTS_DIRECT_READ}} since it refers to a specific stream rather than the entire FS. - Rather than declaring it as a {{const}} I think it's better to use an enum or #define, since consts are a C++ thing and this code is mostly straight C. Also, I think it's better to define it as (1 0) to indicate that this is going to be in a bitfield. - Please add a comment above the definition of the new flag referring to hdfsFile_internal.flags, so we know where the flags end up. - the new {{flags}} field should be unsigned -- {{uint32_t}} probably - in the new test, why are you hardcoding {{localhost:20300}}? I'd think using {{default}} as before is the right choice, since it will pick up whatever is {{fs.default.name}} in your {{core-site.xml}} on the classpath. That way this same test can be run against local FS or against DFS libhdfs implementation of direct read API - Key: HDFS-3110 URL: https://issues.apache.org/jira/browse/HDFS-3110 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs Reporter: Henry Robinson Assignee: Henry Robinson Fix For: 0.24.0 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, which leads to significant performance increases when reading local data from C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245776#comment-13245776 ] Sanjay Radia commented on HDFS-3092: Is there a way to turn off the striping even if Quorom size (Q) is less than Ensemble size (E)? We like the idea that each Journal file contains ALL entries. Our default config: Q is 2 and set of JDs is 3 (roughly equivalent to E). Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to local edits. To begin with, the existing standby mechanism of reading from a file, will continue to be used, instead of from shared edits, from the local edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245778#comment-13245778 ] Todd Lipcon commented on HDFS-3190: --- thanks Bikas. Quick question for reviewers: when I moved this code, I noticed the {{canRead()}} check. Currently if the file exists but can't be read, it returns the default value. I thought this was a little suspicious. Anyone adverse to removing that check, so that we throw an exception if it exists but we can't read it? Or better to keep this as a straight refactor and file a follow-up to think about that? Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245785#comment-13245785 ] Bikas Saha commented on HDFS-3077: -- I have a question around syncing journal nodes and quorum based writes. There will always be a case that a lost journal node comes back up and is syncing its state - the extreme example of which is replacement of a broken journal node with a new node. While it is doing this, will it be part of the quorum when a quorum number of writes must succeed? Say we have 3 journals with the following txids JN1-100, JN2-100, JN3-0 (JN3 just joined) Now say some stuff got written to JN2 and JN3 (quorum commit with JN1 in flight records in the queue because JN1 is slow) JN1-100, JN2-110, JN3-110+syncing_holes At this point something terrible happens and when we recover, we can only access JN1 and JN3 JN1-100, JN3-110+syncing holes At this point of time how do we resolve the ground truth about the journal state and edit logs? Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245806#comment-13245806 ] Todd Lipcon commented on HDFS-3077: --- Hi Bikas. Thanks for bringing up this scenario. I do need to add a section to the doc about failure handling and re-adding failed journals. My thinking is that the granularity of membership is the log segment. This is similar to what we do on local disks today - when we roll the edit log, we attempt to re-add any disks that previously failed. Similarly, when we start a new log segment, we give all of the JNs a chance to pick back up following along with the quorum. To try to map to your example, we'd have the following: JN1: writing edits_inprogress_1 (@txn 100) JN2: writing edits_inprogress_1 (@txn 100) JN3: has been reformatted, comes back online At this point, the QJM can try to write txns to all three, but JN3 won't accept transactions because it doesn't have a currently open log segment. Currently it will just reject them. I can imagine a future optimization in which it would return a special exception, and the QJM could notify the NN that it would like to roll ASAP if possible. Let's say we write another 20 txns, and then roll logs. On the next startLogSegment call, we'd end up with the following: JN1: edits_1-120, edits_inprogress_121 JN2: edits_1-120, edits_inprogress_121 JN3: edits_inprogress_121 so all nodes are now taking part in the quorum. We could optionally at this point have JN3 copy over the edits_1-120 segment from one of the other nodes, but that copy can be asynchronous. It's a repair operation, but given we already have 2 valid replicas, we aren't in any imminent danger of data loss. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245817#comment-13245817 ] Eli Collins commented on HDFS-3187: --- +1 looks good Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond
[ https://issues.apache.org/jira/browse/HDFS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245818#comment-13245818 ] Hadoop QA commented on HDFS-3188: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521216/hdfs-3188.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2172//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2172//console This message is automatically generated. Add infrastructure for waiting for a quorum of ListenableFutures to respond --- Key: HDFS-3188 URL: https://issues.apache.org/jira/browse/HDFS-3188 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3188.txt This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As described in the design document, this class allows a set of ListenableFutures to be wrapped, and the caller can wait for a specific number of responses, or a timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245827#comment-13245827 ] Todd Lipcon commented on HDFS-3187: --- Thanks Eli. I double checked that all the MR, HDFS, and Common tests and code still compile with this change. I didn't run the full suite, but the new guava release is compatible with the old aside from the {{Files}} changes I dealt with in the patch. Will commit momentarily Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3187: -- Resolution: Fixed Fix Version/s: 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 2.0.0 Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245847#comment-13245847 ] Suresh Srinivas commented on HDFS-3077: --- bq. How can step 3 in section 2.4 be completed independent of quorum? Step 4 indicates that it requires a quorum of nodes to respond successfully to the newEpoch message. Here's an example: What I meant was at each JN, step 3 completes. Hence the example Hari was giving. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245850#comment-13245850 ] Suresh Srinivas commented on HDFS-3077: --- bq. so all nodes are now taking part in the quorum. We could optionally at this point have JN3 copy over the edits_1-120 segment from one of the other nodes, but that copy can be asynchronous. It's a repair operation, but given we already have 2 valid replicas, we aren't in any imminent danger of data loss. The proposal in HDFS-3092 is to make the JN3 part of the quorum, only when it has caught up with other JNs. Having this simplify some boundary conditions. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, name-node Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3077-partial.txt, qjournal-design.pdf Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245854#comment-13245854 ] Hudson commented on HDFS-3187: -- Integrated in Hadoop-Hdfs-trunk-Commit #2060 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2060/]) HDFS-3187. Upgrade guava to 11.0.2. Contributed by Todd Lipcon. (Revision 1309181) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309181 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java * /hadoop/common/trunk/hadoop-project/pom.xml Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 2.0.0 Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245857#comment-13245857 ] Hudson commented on HDFS-3187: -- Integrated in Hadoop-Common-trunk-Commit #1985 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1985/]) HDFS-3187. Upgrade guava to 11.0.2. Contributed by Todd Lipcon. (Revision 1309181) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309181 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java * /hadoop/common/trunk/hadoop-project/pom.xml Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 2.0.0 Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3110: - Attachment: HDFS-3110.3.patch New patch that's actually a diff vs trunk this time :/ I incorporated most of Todd's suggestions. I've left HDFS_FILE_SUPPORTS_DIRECT_READ in hdfs.h for now so that users who *really* want to turn off support for some reason (perhaps a bug) have access to the flag that they can set in hdfsFile's guts. I ran the tests against the default local filesystem when no fs.default.name is set, and observed no errors except that the tests expect readDirect to be available. libhdfs implementation of direct read API - Key: HDFS-3110 URL: https://issues.apache.org/jira/browse/HDFS-3110 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs Reporter: Henry Robinson Assignee: Henry Robinson Fix For: 0.24.0 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, HDFS-3110.3.patch Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, which leads to significant performance increases when reading local data from C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-1378) Edit log replay should track and report file offsets in case of errors
[ https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reopened HDFS-1378: Assignee: Colin Patrick McCabe (was: Aaron T. Myers) I'd like to port this to branch-1 so that we can have better error messages there. It should be a trivial port. Any objections? Edit log replay should track and report file offsets in case of errors -- Key: HDFS-1378 URL: https://issues.apache.org/jira/browse/HDFS-1378 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Fix For: 0.23.0 Attachments: hdfs-1378-branch20.txt, hdfs-1378.0.patch, hdfs-1378.1.patch, hdfs-1378.2.txt Occasionally there are bugs or operational mistakes that result in corrupt edit logs which I end up having to repair by hand. In these cases it would be very handy to have the error message also print out the file offsets of the last several edit log opcodes so it's easier to find the right place to edit in the OP_INVALID marker. We could also use this facility to provide a rough estimate of how far along edit log replay the NN is during startup (handy when a 2NN has died and replay takes a while) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1378) Edit log replay should track and report file offsets in case of errors
[ https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-1378: --- Attachment: HDFS-1378-b1.002.patch * port to branch-1 Edit log replay should track and report file offsets in case of errors -- Key: HDFS-1378 URL: https://issues.apache.org/jira/browse/HDFS-1378 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Fix For: 0.23.0 Attachments: HDFS-1378-b1.002.patch, hdfs-1378-branch20.txt, hdfs-1378.0.patch, hdfs-1378.1.patch, hdfs-1378.2.txt Occasionally there are bugs or operational mistakes that result in corrupt edit logs which I end up having to repair by hand. In these cases it would be very handy to have the error message also print out the file offsets of the last several edit log opcodes so it's easier to find the right place to edit in the OP_INVALID marker. We could also use this facility to provide a rough estimate of how far along edit log replay the NN is during startup (handy when a 2NN has died and replay takes a while) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2
[ https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245867#comment-13245867 ] Hudson commented on HDFS-3187: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1998 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1998/]) HDFS-3187. Upgrade guava to 11.0.2. Contributed by Todd Lipcon. (Revision 1309181) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309181 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java * /hadoop/common/trunk/hadoop-project/pom.xml Upgrade guava to 11.0.2 --- Key: HDFS-3187 URL: https://issues.apache.org/jira/browse/HDFS-3187 Project: Hadoop HDFS Issue Type: Sub-task Components: build Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 2.0.0 Attachments: hdfs-3187.txt Guava r11 includes some nice features which we'd like to use in the implementation of HDFS-3077. In particular, {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be turned into a {{ListeningExecutorService}}, so that tasks can be submitted to it and then wrapped as {{ListenableFuture}}s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245871#comment-13245871 ] Hadoop QA commented on HDFS-3190: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521220/hdfs-3190.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2173//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2173//console This message is automatically generated. Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira