date:20120403


[ 
https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245030#comment-13245030
 ] 

Hadoop QA commented on HDFS-3181:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521105/repro.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  org.apache.hadoop.hdfs.TestLeaseRecovery2

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2167//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2167//console

This message is automatically generated.

 testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 
 1 byte less than block size
 -

 Key: HDFS-3181
 URL: https://issues.apache.org/jira/browse/HDFS-3181
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Colin Patrick McCabe
Priority: Critical
 Attachments: TestLeaseRecovery2with1535.patch, repro.txt, testOut.txt


 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
  seems to be failing intermittently on jenkins.
 {code}
 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 Failing for the past 1 build (Since Failed#2163 )
 Took 8.4 sec.
 Error Message
 Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed 
 by DFSClient_NONMAPREDUCE_1147689755_1  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)  at 
 java.security.AccessController.doPrivileged(Native Method)  at 
 javax.security.auth.Subject.doAs(Subject.java:396)  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) 
 Stacktrace
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
 DFSClient_NONMAPREDUCE_1147689755_1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)
   at

[jira] [Updated] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.

2012-04-03 Thread liaowenrui (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liaowenrui updated HDFS-3175:
-

Attachment: HDFS-3175.patch
HDFS-3175.patch

 When the disk space is available back,Namenode resource monitor can 
 automatically take off safemode.
 

 Key: HDFS-3175
 URL: https://issues.apache.org/jira/browse/HDFS-3175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0, 2.0.0
Reporter: liaowenrui
 Attachments: HDFS-3175.patch, HDFS-3175.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.

2012-04-03 Thread liaowenrui (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245066#comment-13245066
 ] 

liaowenrui commented on HDFS-3175:
--

thank you very much

 When the disk space is available back,Namenode resource monitor can 
 automatically take off safemode.
 

 Key: HDFS-3175
 URL: https://issues.apache.org/jira/browse/HDFS-3175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0, 2.0.0
Reporter: liaowenrui
 Attachments: HDFS-3175.patch, HDFS-3175.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.

2012-04-03 Thread liaowenrui (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liaowenrui updated HDFS-3175:
-

Attachment: testcase

 When the disk space is available back,Namenode resource monitor can 
 automatically take off safemode.
 

 Key: HDFS-3175
 URL: https://issues.apache.org/jira/browse/HDFS-3175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0, 2.0.0
Reporter: liaowenrui
 Attachments: HDFS-3175.patch, HDFS-3175.patch, testcase




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3077) Quorum-based protocol for reading and writing edit logs


 [ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3077:
--

Attachment: qjournal-design.pdf

Attached a design doc draft. Look forward to your comments.

 Quorum-based protocol for reading and writing edit logs
 ---

 Key: HDFS-3077
 URL: https://issues.apache.org/jira/browse/HDFS-3077
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha, name-node
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3077-partial.txt, qjournal-design.pdf


 Currently, one of the weak points of the HA design is that it relies on 
 shared storage such as an NFS filer for the shared edit log. One alternative 
 that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
 which provides a highly available replicated edit log on commodity hardware. 
 This JIRA is to implement another alternative, based on a quorum commit 
 protocol, integrated more tightly in HDFS and with the requirements driven 
 only by HDFS's needs rather than more generic use cases. More details to 
 follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than block size

2012-04-03 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245071#comment-13245071
 ] 

Todd Lipcon commented on HDFS-3181:
---

Jenkins seems to have reproduced above

 testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 
 1 byte less than block size
 -

 Key: HDFS-3181
 URL: https://issues.apache.org/jira/browse/HDFS-3181
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Colin Patrick McCabe
Priority: Critical
 Attachments: TestLeaseRecovery2with1535.patch, repro.txt, testOut.txt


 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
  seems to be failing intermittently on jenkins.
 {code}
 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 Failing for the past 1 build (Since Failed#2163 )
 Took 8.4 sec.
 Error Message
 Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed 
 by DFSClient_NONMAPREDUCE_1147689755_1  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)  at 
 java.security.AccessController.doPrivileged(Native Method)  at 
 javax.security.auth.Subject.doAs(Subject.java:396)  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) 
 Stacktrace
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
 DFSClient_NONMAPREDUCE_1147689755_1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655)
   at org.apache.hadoop.ipc.Client.call(Client.java:1159)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:185)
   at $Proxy15.getAdditionalDatanode(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
   at $Proxy15.getAdditionalDatanode(Unknown Source)
   at

[jira] [Updated] (HDFS-3175) When the disk space is available back,Namenode resource monitor can automatically take off safemode.

2012-04-03 Thread liaowenrui (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liaowenrui updated HDFS-3175:
-

Attachment: HDFS-3175.patch

modify format

 When the disk space is available back,Namenode resource monitor can 
 automatically take off safemode.
 

 Key: HDFS-3175
 URL: https://issues.apache.org/jira/browse/HDFS-3175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0, 2.0.0
Reporter: liaowenrui
 Attachments: HDFS-3175.patch, HDFS-3175.patch, HDFS-3175.patch, 
 testcase




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-03 Thread Flavio Junqueira (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245112#comment-13245112
]

Flavio Junqueira commented on HDFS-3092:

Hi Suresh, Thanks for sharing a design document. I have a few comments and
questions if you don't mind:

# I find this design to be very close to bookkeeper, with a few important
differences. One noticeable difference that has been mentioned elsewhere is
that bookies implement mechanisms to enable high performance when there are
multiple concurrent ledgers being written to. Your design does not seem to
consider the possibility of multiple concurrent logs, which you may want to
have for federation. Federation will be useful for large deployments, but not
for small deployments. It sounds like a good idea to have a solution that
covers both cases.
# There has been comments about comparing the different approaches discussed,
and I was wondering what criteria you have been thinking of using to compare
them. I guess it can't be performance because as the numbers Ivan has generated
show, the current bottleneck is the namenode code, not the logging. Until the
existing bottlenecks in the namenode code are removed, having a fast logging
mechanism won't make much difference with respect to throughput.
# I was wondering about how reads to the log are executed if writes only have
to reach a majority quorum. Once it is time to read, how does the reader gets a
consistent view of the log? One JD alone may not have all entries, so I suppose
the reader may need to read from multiple JDs to get a consistent view? Do the
transaction identifiers establish the order of entries in the log? One quick
note is that I don't see why a majority is required; bk does not require a
majority.

Here are some notes I took comparing the bk approach with the one in this jira,
in the case you're interested:

# *Rolling*: The notion of rolling here is equivalent to closing a ledger and
creating a new one. As ledgers are identified with numbers that are
monotonically increasing, the ledger identifiers can be used to order the
sequence of logs created over time.
# *Single writer*: Only one client can add new entries to a ledger. We have the
notion of a recovery client, which is essentially a reader that makes sure that
there is agreement on the end of the ledger. Such a recovery client is also
able to write entries, but these writes are simply to make sure that there is
enough replication.
# *Fencing*: We fence ledgers individually, so that we guarantee that all
operations a ledger writer returns successfully are persisted on enough
bookies. This is different from the approach proposed here, which essentially
fences logging as a whole.
# *Split brain*: In a split-brain situation, bk can have two writers each
writing to a different ledger. However, my understanding is that a namenode
that is failing over cannot make progress without reading the previous log
(ledger), consequently this situation cannot occur with bk and we don’t require
writes to a majority for correctness.
# *Adding JDs*: The mechanism described here mentions explicitly adding a new
JD. My understanding is that a new JD is brought up and it is told somehow to
connect to the namenode and to another JD in the JournalList to sync up. bk
currently only picks bookies from a pool of available bookies through
zookeeper. It shouldn’t be a problem to allow a fixed list of bookies to be
passed upon creating a ledger.
# *Striping*: bk implements striping, although that’s an optional feature. It
is possible to use a configuration like 2-2 or 3-3 (Q-N, Q=quorum size and
N=ensemble size).
# *Failure detection*: bk uses zookeeper ephemeral nodes to track bookies that
are available. A client also changes its ensemble view if it loses a bookie by
adding a new bookie. I’m not exactly sure how you monitor crashes here. Is it
the namenode that keeps track of which JDs in the JournalList are available?

Enable journal protocol based editlog streaming for standby namenode

Key: HDFS-3092
URL: https://issues.apache.org/jira/browse/HDFS-3092
Project: Hadoop HDFS
Issue Type: Improvement
Components: ha, name-node
Affects Versions: 0.24.0, 0.23.3
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
Attachments: MultipleSharedJournals.pdf

[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster

2012-04-03 Thread amith (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245119#comment-13245119
 ] 

amith commented on HDFS-3179:
-

Hi Zhanwei Wang

I exactly dont know about your test script does, but this look similar to 
HDFS-3091.

can u check this once
https://issues.apache.org/jira/browse/HDFS-3091

Please correct me If I am wrong :)

 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster


[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245140#comment-13245140
 ] 

Uma Maheswara Rao G commented on HDFS-3179:
---

@Zhanwei, How many DNs are running in your test cluster?

 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file

2012-04-03 Thread Ashish Singhi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HDFS-3119:


Attachment: HDFS-3119-1.patch

Thanks Brandon for being so kind.
Thanks Uma, Nicholas and Brandon for patch review and comments.

The latest patch fixes Uma, Nicholas and Brandon comments. Also added a test 
case for the patch.

Thanks Uma for your off line help as well :)

 Overreplicated block is not deleted even after the replication factor is 
 reduced after sync follwed by closing that file
 

 Key: HDFS-3119
 URL: https://issues.apache.org/jira/browse/HDFS-3119
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.24.0
Reporter: J.Andreina
Assignee: Ashish Singhi
Priority: Minor
  Labels: patch
 Fix For: 0.24.0, 0.23.2

 Attachments: HDFS-3119-1.patch, HDFS-3119.patch


 cluster setup:
 --
 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB
 step1: write a file filewrite.txt of size 90bytes with sync(not closed) 
 step2: change the replication factor to 1  using the command: ./hdfs dfs 
 -setrep 1 /filewrite.txt
 step3: close the file
 * At the NN side the file Decreasing replication from 2 to 1 for 
 /filewrite.txt , logs has occured but the overreplicated blocks are not 
 deleted even after the block report is sent from DN
 * while listing the file in the console using ./hdfs dfs -ls  the 
 replication factor for that file is mentioned as 1
 * In fsck report for that files displays that the file is replicated to 2 
 datanodes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file

2012-04-03 Thread Ashish Singhi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HDFS-3119:


Labels: patch  (was: )
Status: Patch Available  (was: Open)

 Overreplicated block is not deleted even after the replication factor is 
 reduced after sync follwed by closing that file
 

 Key: HDFS-3119
 URL: https://issues.apache.org/jira/browse/HDFS-3119
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.24.0
Reporter: J.Andreina
Assignee: Ashish Singhi
Priority: Minor
  Labels: patch
 Fix For: 0.24.0, 0.23.2

 Attachments: HDFS-3119-1.patch, HDFS-3119.patch


 cluster setup:
 --
 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB
 step1: write a file filewrite.txt of size 90bytes with sync(not closed) 
 step2: change the replication factor to 1  using the command: ./hdfs dfs 
 -setrep 1 /filewrite.txt
 step3: close the file
 * At the NN side the file Decreasing replication from 2 to 1 for 
 /filewrite.txt , logs has occured but the overreplicated blocks are not 
 deleted even after the block report is sent from DN
 * while listing the file in the console using ./hdfs dfs -ls  the 
 replication factor for that file is mentioned as 1
 * In fsck report for that files displays that the file is replicated to 2 
 datanodes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file

[
https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245282#comment-13245282
]

Hadoop QA commented on HDFS-3119:
-

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12521139/HDFS-3119-1.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2168//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2168//console

This message is automatically generated.

Overreplicated block is not deleted even after the replication factor is
reduced after sync follwed by closing that file

Key: HDFS-3119
URL: https://issues.apache.org/jira/browse/HDFS-3119
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.24.0
Reporter: J.Andreina
Assignee: Ashish Singhi
Priority: Minor
Labels: patch
Fix For: 0.24.0, 0.23.2

Attachments: HDFS-3119-1.patch, HDFS-3119.patch

cluster setup:
--
1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB
step1: write a file filewrite.txt of size 90bytes with sync(not closed)
step2: change the replication factor to 1 using the command: ./hdfs dfs
-setrep 1 /filewrite.txt
step3: close the file
* At the NN side the file Decreasing replication from 2 to 1 for
/filewrite.txt , logs has occured but the overreplicated blocks are not
deleted even after the block report is sent from DN
* while listing the file in the console using ./hdfs dfs -ls the
replication factor for that file is mentioned as 1
* In fsck report for that files displays that the file is replicated to 2
datanodes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3120) Enable hsync and hflush by default


[ 
https://issues.apache.org/jira/browse/HDFS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245287#comment-13245287
 ] 

Hudson commented on HDFS-3120:
--

Integrated in Hadoop-Hdfs-trunk #1004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/])
Previous commit was for HDFS-3120, fixing up CHANGES.txt (Revision 1308615)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308615
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Enable hsync and hflush by default
 --

 Key: HDFS-3120
 URL: https://issues.apache.org/jira/browse/HDFS-3120
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.0

 Attachments: hdfs-3120.txt, hdfs-3120.txt


 The work on branch-20-append was to support *sync*, for durable HBase WALs, 
 not *append*. The branch-20-append implementation is known to be buggy. 
 There's been confusion about this, we often answer queries on the list [like 
 this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to 
 enable correct sync on branch-1 for HBase is to set dfs.support.append to 
 true in your config, which has the side effect of enabling append (which we 
 don't want to do).
 Let's add a new *dfs.support.sync* option that enables working sync (which is 
 basically the current dfs.support.append flag modulo one place where it's not 
 referring to sync). For compatibility, if dfs.support.append is set, 
 dfs.support.sync will be set as well. This way someone can enable sync for 
 HBase and still keep the current behavior that if dfs.support.append is not 
 set then an append operation will result in an IOE indicating append is not 
 supported. We should do this on trunk as well, as there's no reason to 
 conflate hsync and append with a single config even if append works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3130) Move FSDataset implemenation to a package


[ 
https://issues.apache.org/jira/browse/HDFS-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245285#comment-13245285
 ] 

Hudson commented on HDFS-3130:
--

Integrated in Hadoop-Hdfs-trunk #1004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/])
HDFS-3130. Move fsdataset implementation to a package. (Revision 1308437)

 Result = FAILURE
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308437
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetAsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FinalizedReplica.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaAlreadyExistsException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaNotFoundException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaUnderRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaWaitingToBeRecovered.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicasMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
*

[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer


[ 
https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245289#comment-13245289
 ] 

Hudson commented on HDFS-3148:
--

Integrated in Hadoop-Hdfs-trunk #1004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/])
HDFS-3148. The client should be able to use multiple local interfaces for 
data transfer. Contributed by Eli Collins (Revision 1308617)
HDFS-3148. The client should be able to use multiple local interfaces for data 
transfer. Contributed by Eli Collins (Revision 1308614)
HADOOP-8210. Common side of HDFS-3148: The client should be able to use 
multiple local interfaces for data transfer. Contributed by Eli Collins 
(Revision 1308457)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308617
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308614
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/TestFiPipelines.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/permission/TestStickyBit.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/FileAppendTest4.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend3.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend4.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreationDelete.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPipelines.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReadWhileWriting.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRenameWhileOpen.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestDatanodeRestart.java

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308457
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNS.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetUtils.java


 The client should be able to use multiple local interfaces for data transfer
 

 Key: HDFS-3148
 URL: https://issues.apache.org/jira/browse/HDFS-3148
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.1.0, 2.0.0

 Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt, 
 hdfs-3148.txt, hdfs-3148.txt

[jira] [Commented] (HDFS-3126) Journal stream from the namenode to backup needs to have a timeout


[ 
https://issues.apache.org/jira/browse/HDFS-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245293#comment-13245293
 ] 

Hudson commented on HDFS-3126:
--

Integrated in Hadoop-Hdfs-trunk #1004 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/])
HDFS-3126. Journal stream from Namenode to BackupNode needs to have 
timeout. Contributed by Hari Mankude. (Revision 1308636)

 Result = FAILURE
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308636
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


 Journal stream from the namenode to backup needs to have a timeout
 --

 Key: HDFS-3126
 URL: https://issues.apache.org/jira/browse/HDFS-3126
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: 0.24.0
Reporter: Hari Mankude
Assignee: Hari Mankude
 Fix For: 0.24.0

 Attachments: hdfs-3126.patch, hdfs-3126.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer

2012-04-03 Thread Suresh Srinivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245314#comment-13245314
]

Suresh Srinivas commented on HDFS-3148:
---

Hey guys, can you do this work in a separate branch as well. There are too many
things going on to catchup on things. I have not had time to look into the
proposal and my feeling was, is this complexity worth adding. Though I have not
had time to think about how much complexity this feature adds.

Also, is Daryn's concern addressed?

The client should be able to use multiple local interfaces for data transfer

Key: HDFS-3148
URL: https://issues.apache.org/jira/browse/HDFS-3148
Project: Hadoop HDFS
Issue Type: Sub-task
Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins
Fix For: 1.1.0, 2.0.0

Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt,
hdfs-3148.txt, hdfs-3148.txt

HDFS-3147 covers using multiple interfaces on the server (Datanode) side.
Clients should also be able to utilize multiple *local* interfaces for
outbound connections instead of always using the interface for the local
hostname. This can be accomplished with a new configuration parameter
({{dfs.client.local.interfaces}}) that accepts a list of interfaces the
client should use. Acceptable configuration values are the same as the
{{dfs.datanode.available.interfaces}} parameter. The client binds its socket
to a specific interface, which enables outbound traffic to use that
interface. Binding the client socket to a specific address is not sufficient
to ensure egress traffic uses that interface. Eg if multiple interfaces are
on the same subnet the host requires IP rules that use the source address
(which bind sets) to select the destination interface. The SO_BINDTODEVICE
socket option could be used to select a specific interface for the connection
instead, however it requires JNI (is not in Java's SocketOptions) and root
access, which we don't want to require clients have.
Like HDFS-3147, the client can use multiple local interfaces for data
transfer. Since the client already cache their connections to DNs choosing a
local interface at random seems like a good policy. Users can also pin a
specific client to a specific interface by specifying just that interface in
dfs.client.local.interfaces.
This change was discussed in HADOOP-6210 a while back, and is actually
useful/independent of the other HDFS-3140 changes.

[jira] [Commented] (HDFS-3120) Enable hsync and hflush by default


[ 
https://issues.apache.org/jira/browse/HDFS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245335#comment-13245335
 ] 

Hudson commented on HDFS-3120:
--

Integrated in Hadoop-Mapreduce-trunk #1039 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/])
Previous commit was for HDFS-3120, fixing up CHANGES.txt (Revision 1308615)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308615
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Enable hsync and hflush by default
 --

 Key: HDFS-3120
 URL: https://issues.apache.org/jira/browse/HDFS-3120
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.0

 Attachments: hdfs-3120.txt, hdfs-3120.txt


 The work on branch-20-append was to support *sync*, for durable HBase WALs, 
 not *append*. The branch-20-append implementation is known to be buggy. 
 There's been confusion about this, we often answer queries on the list [like 
 this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to 
 enable correct sync on branch-1 for HBase is to set dfs.support.append to 
 true in your config, which has the side effect of enabling append (which we 
 don't want to do).
 Let's add a new *dfs.support.sync* option that enables working sync (which is 
 basically the current dfs.support.append flag modulo one place where it's not 
 referring to sync). For compatibility, if dfs.support.append is set, 
 dfs.support.sync will be set as well. This way someone can enable sync for 
 HBase and still keep the current behavior that if dfs.support.append is not 
 set then an append operation will result in an IOE indicating append is not 
 supported. We should do this on trunk as well, as there's no reason to 
 conflate hsync and append with a single config even if append works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3130) Move FSDataset implemenation to a package


[ 
https://issues.apache.org/jira/browse/HDFS-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245333#comment-13245333
 ] 

Hudson commented on HDFS-3130:
--

Integrated in Hadoop-Mapreduce-trunk #1039 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/])
HDFS-3130. Move fsdataset implementation to a package. (Revision 1308437)

 Result = FAILURE
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308437
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetAsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FinalizedReplica.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaAlreadyExistsException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaNotFoundException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaUnderRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaWaitingToBeRecovered.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicasMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/LDir.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RollingLogsImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReport.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
*

[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer


[ 
https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245337#comment-13245337
 ] 

Hudson commented on HDFS-3148:
--

Integrated in Hadoop-Mapreduce-trunk #1039 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/])
HDFS-3148. The client should be able to use multiple local interfaces for 
data transfer. Contributed by Eli Collins (Revision 1308617)
HDFS-3148. The client should be able to use multiple local interfaces for data 
transfer. Contributed by Eli Collins (Revision 1308614)
HADOOP-8210. Common side of HDFS-3148: The client should be able to use 
multiple local interfaces for data transfer. Contributed by Eli Collins 
(Revision 1308457)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308617
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308614
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/TestFiPipelines.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/permission/TestStickyBit.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/FileAppendTest4.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend2.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend3.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend4.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreationDelete.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPipelines.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReadWhileWriting.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRenameWhileOpen.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestDatanodeRestart.java

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308457
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/DNS.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetUtils.java


 The client should be able to use multiple local interfaces for data transfer
 

 Key: HDFS-3148
 URL: https://issues.apache.org/jira/browse/HDFS-3148
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.1.0, 2.0.0

 Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt, 
 hdfs-3148.txt,

[jira] [Commented] (HDFS-3126) Journal stream from the namenode to backup needs to have a timeout


[ 
https://issues.apache.org/jira/browse/HDFS-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245341#comment-13245341
 ] 

Hudson commented on HDFS-3126:
--

Integrated in Hadoop-Mapreduce-trunk #1039 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/])
HDFS-3126. Journal stream from Namenode to BackupNode needs to have 
timeout. Contributed by Hari Mankude. (Revision 1308636)

 Result = FAILURE
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1308636
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


 Journal stream from the namenode to backup needs to have a timeout
 --

 Key: HDFS-3126
 URL: https://issues.apache.org/jira/browse/HDFS-3126
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: 0.24.0
Reporter: Hari Mankude
Assignee: Hari Mankude
 Fix For: 0.24.0

 Attachments: hdfs-3126.patch, hdfs-3126.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-03 Thread Suresh Srinivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245346#comment-13245346
]

Suresh Srinivas commented on HDFS-3092:
---

bq. Your design does not seem to consider the possibility of multiple
concurrent logs, which you may want to have for federation.
For HDFS editlogs, my feeling is that there will only be three JDs. One on the
active namenode, second on the standby and a third JD on one of the machines.
In federation, one has to configure a JD per Federated namespace. Alternative
is to use BookKeeper, since it could make the deployment simpler for federated
large cluster.

bq. There has been comments about comparing the different approaches discussed,
and I was wondering what criteria you have been thinking of using to compare
them.
I think the comment was more about comparing the design and complexity of
deployment and not benchmarks for two systems. Performance is not the
motivation for this jira.

bq. I was wondering about how reads to the log are executed if writes only have
to reach a majority quorum. Once it is time to read, how does the reader gets a
consistent view of the log? One JD alone may not have all entries, so I suppose
the reader may need to read from multiple JDs to get a consistent view? Do the
transaction identifiers establish the order of entries in the log? One quick
note is that I don't see why a majority is required; bk does not require a
majority.
We decided on majority quorum to keep the design simple, though it is strictly
not necessary. A JD in JournalList is supposed to have all the entries and any
JD from the list can be used to read the journals.

bq. Here are some notes I took comparing the bk approach with the one in this
jira, in the case you're interested
I noticed that as well. After we went thourgh many issues that this solution
had to take care of, the solution looks very similar to BK. That is comforting
:-)

Enable journal protocol based editlog streaming for standby namenode

Currently standby namenode relies on reading shared editlogs to stay current
with the active namenode, for namespace changes. BackupNode used streaming
edits from active namenode for doing the same. This jira is to explore using
journal protocol based editlog streams for the standby namenode. A daemon in
standby will get the editlogs from the active and write it to local edits. To
begin with, the existing standby mechanism of reading from a file, will
continue to be used, instead of from shared edits, from the local edits.

[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer

2012-04-03 Thread Daryn Sharp (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245366#comment-13245366
]

Daryn Sharp commented on HDFS-3148:
---

Also, is Daryn's concern addressed?

I believe so. Part of the confusion was I didn't fully comprehend Eli's
earlier responses. Todd made a great point that we need to ensure we have
really good documentation for the feature. It's going to require system level
configuration to work correctly.

The client should be able to use multiple local interfaces for data transfer

Key: HDFS-3148
URL: https://issues.apache.org/jira/browse/HDFS-3148
Project: Hadoop HDFS
Issue Type: Sub-task
Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins
Fix For: 1.1.0, 2.0.0

Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt,
hdfs-3148.txt, hdfs-3148.txt

[jira] [Updated] (HDFS-3166) Hftp connections do not have a timeout

2012-04-03 Thread Daryn Sharp (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3166:
--

Attachment: HDFS-3166.patch

Linux considers the requested listen backlog as advisory...  It rounds it up 
to the next power of 2, with a floor of 16.  I modified the test to try up to 
32 times to trigger a connect timeout.

 Hftp connections do not have a timeout
 --

 Key: HDFS-3166
 URL: https://issues.apache.org/jira/browse/HDFS-3166
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, 
 HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch


 Hftp connections do not have read timeouts.  This leads to indefinitely hung 
 sockets when there is a network outage during which time the remote host 
 closed the socket.
 This may also affect WebHdfs, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-03 Thread Flavio Junqueira (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245475#comment-13245475
]

Flavio Junqueira commented on HDFS-3092:

Thanks for the responses, Suresh.

bq. For HDFS editlogs, my feeling is that there will only be three JDs. One on
the active namenode, second on the standby and a third JD on one of the
machines. In federation, one has to configure a JD per Federated namespace.
Alternative is to use BookKeeper, since it could make the deployment simpler
for federated large cluster.

When you say three JDs, that's the degree of replication, right? When I said
multiple logs, I was referring to multiple namenodes writing to different logs,
as with federation.

bq. I think the comment was more about comparing the design and complexity of
deployment and not benchmarks for two systems. Performance is not the
motivation for this jira.

Got it. You're thinking about a qualitative design based on the requirements
identified. Correctness sounds like an obvious candidate. :-)

bq. We decided on majority quorum to keep the design simple, though it is
strictly not necessary. A JD in JournalList is supposed to have all the entries
and any JD from the list can be used to read the journals.

I think my confusion here is that you require a quorum to be able to
acknowledge the operation, but in reality you try to write to everyone. If you
can't write to everyone, then you induce a view change (change to JournalList).
Is this right?

Enable journal protocol based editlog streaming for standby namenode

[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster


[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245476#comment-13245476
 ] 

Zhanwei.Wang commented on HDFS-3179:


@Uma and amith
It seems the same question with HDFS-3091.

I configure only one datanode and create a file using default number of 
replica(3), 
existings(1) = replication/2(3/2==1) will be satisfied and it can not replace 
with the new node as there is no extra nodes exist in the cluster.

HDFS-3091 should patch to 0.23.2 branch


 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster


[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245501#comment-13245501
 ] 

Zhanwei.Wang commented on HDFS-3179:


@Uma and amith
Another question, in this test script, I first create a new EMPTY file and 
append to the file twice.
The first append succeed because file is empty, to create a pipeline, the 
stage is PIPELINE_SETUP_CREATE and the policy will not be checked.
The second append failed because the stage is PIPELINE_SETPU_APPEND and the 
policy will be checked.

So from the view of user, the first append succeed while the second fail, is 
that a good idea?

{code}
  // get new block from namenode
  if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) {
if(DFSClient.LOG.isDebugEnabled()) {
  DFSClient.LOG.debug(Allocating new block);
}
nodes = nextBlockOutputStream(src);
initDataStreaming();
  } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND) {
if(DFSClient.LOG.isDebugEnabled()) {
  DFSClient.LOG.debug(Append to block  + block);
}
setupPipelineForAppendOrRecovery();  //check the policy here
initDataStreaming();
  }
{code}

 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3000) Add a public API for setting quotas

2012-04-03 Thread Daryn Sharp (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245506#comment-13245506
 ] 

Daryn Sharp commented on HDFS-3000:
---

+1 Although I'd suggest maybe adding a ctor that takes a filesystem instance.  
It user may want to use a custom configured filesystem, or avoid creation of 
another fs instance if the fs cache is disabled.

 Add a public API for setting quotas
 ---

 Key: HDFS-3000
 URL: https://issues.apache.org/jira/browse/HDFS-3000
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3000.patch, HDFS-3000.patch, HDFS-3000.patch, 
 HDFS-3000.patch


 Currently one can set the quota of a file or directory from the command line, 
 but if a user wants to set it programmatically, they need to use 
 DistributedFileSystem, which is annotated InterfaceAudience.Private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.


[ 
https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245515#comment-13245515
 ] 

Zhanwei.Wang commented on HDFS-3091:


Hi, Nocholas
{quote}
I would say the failures are expected. The feature is to guarantee the number 
of replicas that the user is asking. However, the cluster is too small that the 
guarantee is impossible. It makes sense to fail the write requests.
{quote}

I agree with you, but have a look at code. in HDFS-3179, I first create a EMPTY 
file and append twice, the first append finished successfully but the second 
failed since there is only one datanode and the number of replica is 3.

Is that what you want to see? I think the policy check should fail on the first 
write to the file. 



 Update the usage limitations of ReplaceDatanodeOnFailure policy in the config 
 description for the smaller clusters.
 ---

 Key: HDFS-3091
 URL: https://issues.apache.org/jira/browse/HDFS-3091
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 2.0.0

 Attachments: h3091_20120319.patch


 When verifying the HDFS-1606 feature, Observed couple of issues.
 Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont 
 have enough DN to replcae in cluster and will be resulted into write failure.
 {quote}
 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010]
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
 {quote}
 Lets take some cases:
 1) Replication factor 3 and cluster size also 3 and unportunately pipeline 
 drops to 1.
 ReplaceDatanodeOnFailure will be satisfied because *existings(1)= 
 replication/2 (3/2==1)*.
 But when it finding the new node to replace obiously it can not find the new 
 node and the sanity check will fail.
 This will be resulted to Wite failure.
 2) Replication factor 10 (accidentally user sets the replication factor to 
 higher value than cluster size),
   Cluser has only 5 datanodes.
   Here even if one node fails also write will fail with same reason.
   Because pipeline max will be 5 and killed one datanode, then existings will 
 be 4
   *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it 
 can not replace with the new node as there is no extra nodes exist in the 
 cluster. This will be resulted to write failure.
 3) sync realted opreations also fails in this situations ( will post the 
 clear scenarios)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-03 Thread Suresh Srinivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245523#comment-13245523
]

Suresh Srinivas commented on HDFS-3092:
---

bq. When you say three JDs, that's the degree of replication, right? When I
said multiple logs, I was referring to multiple namenodes writing to different
logs, as with federation.
Right, three JDs for degree of replication. However, I do understand multiple
logs - that is a log per namespace. For every namespace, in federation, active
and standby namenode + additional JD is needed.

bq. I think my confusion here is that you require a quorum to be able to
acknowledge the operation, but in reality you try to write to everyone. If you
can't write to everyone, then you induce a view change (change to JournalList).
Is this right?
Yes. In the first cut we write to all the JDs that are active. At least quorum
should be written. This can be improved in the future by waiting for only
Quorum JDs.

Enable journal protocol based editlog streaming for standby namenode

[jira] [Created] (HDFS-3183) Add JournalManager implementation to use local namenode, remote namenode and a configured JournalDaemon for storing editlogs

2012-04-03 Thread Suresh Srinivas (Created) (JIRA)

Add JournalManager implementation to use local namenode, remote namenode and a 
configured JournalDaemon for storing editlogs


 Key: HDFS-3183
 URL: https://issues.apache.org/jira/browse/HDFS-3183
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Suresh Srinivas


The JournalManager is used in HA configuration and uses the following journal 
targets:
- local namenode
- Other namenode
- A configured JournalDaemon target from configuration

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245538#comment-13245538
]

Hadoop QA commented on HDFS-3166:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12521164/HDFS-3166.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2169//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2169//console

This message is automatically generated.

Hftp connections do not have a timeout
--

Key: HDFS-3166
URL: https://issues.apache.org/jira/browse/HDFS-3166
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client
Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch,
HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch

Hftp connections do not have read timeouts. This leads to indefinitely hung
sockets when there is a network outage during which time the remote host
closed the socket.
This may also affect WebHdfs, etc.

[jira] [Updated] (HDFS-3043) HA: ZK-based client proxy provider

2012-04-03 Thread Eli Collins (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3043:
--

Target Version/s: Auto failover (HDFS-3042)

 HA: ZK-based client proxy provider
 --

 Key: HDFS-3043
 URL: https://issues.apache.org/jira/browse/HDFS-3043
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, hdfs client
Reporter: Todd Lipcon
Assignee: Aaron T. Myers

 When HDFS-2185 is implemented, ZooKeeper can be used to locate the active 
 NameNode. We can use this from the DFS client in order to connect to the 
 correct NN without having to configure a list of possibly-active NNs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3065) HA: Newly active NameNode does not recognize decommissioning DataNode

2012-04-03 Thread Eli Collins (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins updated HDFS-3065:
--

Priority: Minor (was: Major)

HA: Newly active NameNode does not recognize decommissioning DataNode
-

Key: HDFS-3065
URL: https://issues.apache.org/jira/browse/HDFS-3065
Project: Hadoop HDFS
Issue Type: Bug
Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Stephen Chu
Priority: Minor

I'm working on a cluster where, originally, styx01 hosts the active NameNode
and styx02 hosts the standby NameNode.
In both styx01's and styx02's exclude file, I added the DataNode on styx03.I
then ran _hdfs dfsadmin -refreshNodes_ and verified on styx01 NN web UI that
the DN on styx03 was decommissioning. After waiting a few minutes, I checked
the standby NN web UI (while the DN was decommissioning) and didn't see that
the DN was marked as decommissioning.
I executed manual failover, making styx02 NN active and styx01 NN standby. I
checked the newly active NN web UI, and the DN was still not marked as
decommissioning, even after a few minutes. However, the newly standby NN's
web UI still showed the DN as decommissioning.
I added another DN to the exclude file, and executed _hdfs dfsadmin
-refreshNodes_, but the styx02 NN web UI still did not update with the
decommissioning nodes.
I failed back over to make styx01 NN active and styx02 NN standby. I checked
the styx01 NN web UI and saw that it correctly marked 2 DNs as
decommissioning.

[jira] [Updated] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than CRC chunk size


 [ 
https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3181:
-

Description: 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 seems to be failing intermittently on jenkins.

{code}
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
Failing for the past 1 build (Since Failed#2163 )
Took 8.4 sec.
Error Message

Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
DFSClient_NONMAPREDUCE_1147689755_1  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
  at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
  at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)  at 
java.security.AccessController.doPrivileged(Native Method)  at 
javax.security.auth.Subject.doAs(Subject.java:396)  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) 

Stacktrace

org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
DFSClient_NONMAPREDUCE_1147689755_1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
...
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy15.getAdditionalDatanode(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:317)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:828)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
{code}

  was:
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 seems to be failing intermittently on jenkins.

{code}
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
Failing for the past 1 build (Since Failed#2163 )
Took 8.4 sec.
Error Message

Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
DFSClient_NONMAPREDUCE_1147689755_1  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
  at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
  at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
  at

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-04-03 Thread Hari Mankude (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245578#comment-13245578
]

Hari Mankude commented on HDFS-3077:

Todd,

The doc is excellent. Had a comment on a potential issue which could result due
to epochnumber with certain failure scenarios. Specifically, I am talking about
the scenario in section 2.5.6

J1 is at txid 153, J2 is at txid 150 and J3 is at txid 125. Epochnumber on all
the journals is 1. Both NN1 and NN2 are trying to become_active() at the same
time. NN1 talks to J1, J2 and sets the proposedEpoch to 2. NN2 talks to J2 and
J3 and decides to set the proposedEpoch to 2.

NN1 succeeds in setting newEpoch to 2 on J1 and fails on J2 and J3. NN1 dies
since it does not have quorum.
NN2 succeeds in setting newEpoch to 2 on J2 and J3 and has the quorum. NN2
cannot talk to J1. Similar to the scenario in 2.5.6, NN2 writes 151, 152,153
into J2 and J3 and then dies.

So currently, state is epoch number is 2 on all the journals and J1, J2 and J3
are at 153. We have a problem since it is not possible to distinguish between
log entries in J1 vs J2 and J3.

Quorum-based protocol for reading and writing edit logs
---

Key: HDFS-3077
URL: https://issues.apache.org/jira/browse/HDFS-3077
Project: Hadoop HDFS
Issue Type: New Feature
Components: ha, name-node
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Attachments: hdfs-3077-partial.txt, qjournal-design.pdf

Currently, one of the weak points of the HA design is that it relies on
shared storage such as an NFS filer for the shared edit log. One alternative
that has been proposed is to depend on BookKeeper, a ZooKeeper subproject
which provides a highly available replicated edit log on commodity hardware.
This JIRA is to implement another alternative, based on a quorum commit
protocol, integrated more tightly in HDFS and with the requirements driven
only by HDFS's needs rather than more generic use cases. More details to
follow.

[jira] [Commented] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than CRC chunk size

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245582#comment-13245582
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3181:
--

Hi Todd, thanks for clarifying it.  I can reproduce the failure now.

 testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 
 1 byte less than CRC chunk size
 -

 Key: HDFS-3181
 URL: https://issues.apache.org/jira/browse/HDFS-3181
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Colin Patrick McCabe
Priority: Critical
 Attachments: TestLeaseRecovery2with1535.patch, repro.txt, testOut.txt


 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
  seems to be failing intermittently on jenkins.
 {code}
 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 Failing for the past 1 build (Since Failed#2163 )
 Took 8.4 sec.
 Error Message
 Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed 
 by DFSClient_NONMAPREDUCE_1147689755_1  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)  at 
 java.security.AccessController.doPrivileged(Native Method)  at 
 javax.security.auth.Subject.doAs(Subject.java:396)  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) 
 Stacktrace
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
 DFSClient_NONMAPREDUCE_1147689755_1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
 ...
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
   at $Proxy15.getAdditionalDatanode(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:317)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:828)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3153) For HA, a logical name is visible in URIs - add an explicit logical name

2012-04-03 Thread Eli Collins (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3153:
--

Issue Type: Improvement  (was: Sub-task)
Parent: (was: HDFS-1623)

 For HA, a logical name is visible in URIs - add an explicit logical name
 

 Key: HDFS-3153
 URL: https://issues.apache.org/jira/browse/HDFS-3153
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-03 Thread Henry Robinson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.2.patch

Patch addressing Todd's concerns.

I added a 'flags' field to hdfsFile that has a bit set if a direct read is 
supported. I detect that by trying to issue a 0-byte read when the file is 
created.  If an exception is thrown, the flag is cleared, otherwise it is set. 
Once the flag is set, all subsequent hdfsRead calls will be diverted to 
hdfsReadDirect. 

An alternative is to use reflection to grab the input stream inside 
FsDataInputStream and use reflection to look for ByteBufferReadable, but that 
feels a little fragile (and complex to do in C); plus if some FS implements 
read(ByteBuffer) only to stub it out with a UnsupportedOperationException or 
similar, reads would never work correctly. 

 libhdfs implementation of direct read API
 -

 Key: HDFS-3110
 URL: https://issues.apache.org/jira/browse/HDFS-3110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 0.24.0

 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch


 Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
 which leads to significant performance increases when reading local data from 
 C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-04-03 Thread Suresh Srinivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245588#comment-13245588
]

Suresh Srinivas commented on HDFS-3077:
---

Thanks for posting the design. Now I understand your comment that there is a
lot of common things between this one and the approach in HDFS-3092. Here are
some high level comments:
# Terminology - JournalDaemon or JournalNode. I prefer JournalDaemon because my
plan was to run them in the same process space as the namenode. A JournalDeamon
could also be stand-alone process.
# I like the idea of quorum writes and maintaining the queue. 3092 design
currently uses timeout to declare a JD slow and fail it. We were planning to
punting on it until we had first implementation.
# newEpoch() is called fence() in HDFS-3092. My preference is to use the name
fence(). I was using version # which is called epoch. I think the name epoch
sounds better. The key difference is that version # is generated from znode in
HDFS-3092. So two namenodes cannot use the same epoch number. I think there is
a bug with the approach you have described, stemming from the fact that two
namenodes can use the same epoch and step 3 in 2.4 can be completed independent
of quorum. This is shown in Hari's example.
# I prefer to record epoch in startLogSegment filler record. startLogSegment
record was never part of the journal, which we had added for structural
reasons. So adding epoch info to it should not matter. The way I see it is -
journal belongs to a segment. Segment has single version # or epoch.
# In both proposals epoch or version # needs to be sent in all journal requests.

We could certainly make a list of common work items and create jiras, so that
many people can collaborate and wrap it up, like we did in HDFS-1623.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245602#comment-13245602
 ] 

Hadoop QA commented on HDFS-3110:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521180/HDFS-3110.2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2170//console

This message is automatically generated.

 libhdfs implementation of direct read API
 -

 Key: HDFS-3110
 URL: https://issues.apache.org/jira/browse/HDFS-3110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 0.24.0

 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch


 Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
 which leads to significant performance increases when reading local data from 
 C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster


[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245601#comment-13245601
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3179:
--

I think the problem is one datanode with replication 3.  What should be the 
user expectation?  It seems that users won't be happy if we do not allow 
append.  However, if we allow appending to a single replica and the replica 
become corrupted, then it is possible to have data loss - I can imagine in some 
extreme cases that a user is appending to a single replica slowly, admin add 
more datanodes later on but the block won't be replicated since the file is not 
closed, and then the datanode with the single replica fails.  Is this case 
acceptable to you?

 So from the view of user, the first append succeed while the second fail, is 
 that a good idea?

The distinction is whether there is pre-append data.  There are pre-append data 
in the replica in the second append.  The pre-append data was in a closed file 
and if the datanode fails during append, it could have data loss.  However, in 
the first append, there is no pre-append data.  If the append fails and the new 
replica is lost, it is a sort of okay since only the new data is lost.

The add-datanode feature of is to prevent data loss on pre-append data.  Users 
(or admin) could turn it off as mentioned in HDFS-3091.  I think we may improve 
the error message.  Is it good enough?  Or any suggestion?

 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3184) Add public HDFS client API

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Created) (JIRA)

Add public HDFS client API
--

 Key: HDFS-3184
 URL: https://issues.apache.org/jira/browse/HDFS-3184
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


There are some useful operations in HDFS but not in the FileSystem API; see a 
list in [Uma's 
comment|https://issues.apache.org/jira/browse/HDFS-1599?focusedCommentId=13243105page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13243105].
  These operations should be made available to the public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1599) Umbrella Jira for Improving HBASE support in HDFS

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245616#comment-13245616
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1599:
--

 Most of the reflection in HBase has to do with version compatibility, not 
 accessing private APIs. Adding a new API on HDFS doesn't solve the problem, 
 really, since the whole reason for the reflection is to compile against old 
 versions which don't have the new APIs 

It does not solve the problem today but it will solve the problem in the 
future.  :)

 Umbrella Jira for Improving HBASE support in HDFS
 -

 Key: HDFS-1599
 URL: https://issues.apache.org/jira/browse/HDFS-1599
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia

 Umbrella Jira for improved HBase support in HDFS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1599) Umbrella Jira for Improving HBASE support in HDFS

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245617#comment-13245617
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1599:
--

Uma, thanks for listing them out.  I have created HDFS-3184 for adding new HDFS 
client APIs.

 Umbrella Jira for Improving HBASE support in HDFS
 -

 Key: HDFS-1599
 URL: https://issues.apache.org/jira/browse/HDFS-1599
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia

 Umbrella Jira for improved HBase support in HDFS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3184) Add public HDFS client API

2012-04-03 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245629#comment-13245629
 ] 

Uma Maheswara Rao G commented on HDFS-3184:
---

Great, Thanks a lot Nicholas for filing the JIRA. :-)

 Add public HDFS client API
 --

 Key: HDFS-3184
 URL: https://issues.apache.org/jira/browse/HDFS-3184
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE

 There are some useful operations in HDFS but not in the FileSystem API; see a 
 list in [Uma's 
 comment|https://issues.apache.org/jira/browse/HDFS-1599?focusedCommentId=13243105page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13243105].
   These operations should be made available to the public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3166) Hftp connections do not have a timeout

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3166:
-

   Resolution: Fixed
Fix Version/s: 3.0.0
   2.0.0
   0.23.3
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 the patch looks good.

The failed test is not related.

I have committed this.  Thanks, Daryn!

 Hftp connections do not have a timeout
 --

 Key: HDFS-3166
 URL: https://issues.apache.org/jira/browse/HDFS-3166
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 0.23.3, 2.0.0, 3.0.0

 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, 
 HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch


 Hftp connections do not have read timeouts.  This leads to indefinitely hung 
 sockets when there is a network outage during which time the remote host 
 closed the socket.
 This may also affect WebHdfs, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-04-03 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245656#comment-13245656
]

Todd Lipcon commented on HDFS-3077:
---

bq. So currently, state is epoch number is 2 on all the journals and J1, J2 and
J3 are at 153. We have a problem since it is not possible to distinguish
between log entries in J1 vs J2 and J3.

Hey Hari. Thanks for taking a look in such good detail.

I think the doc is currently unclear about the proposed solution described in
2.5.6 -- the idea is not to use just the lastPromisedEpoch here to
distinguish the JNs, but rather to attach the epoch number to each log segment,
based on the epoch in which that segment was started. So, even though in your
scenario NN1 sets J1.lastPromisedEpoch=2, the log segment will retain e=1. Once
a segment's epoch is set, it is never changed (unless the segment is removed by
a synchronization)

Does that make sense? If so I will try to clarify the document.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Updated] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.


 [ 
https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3176:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Kihwal!

 JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
 ---

 Key: HDFS-3176
 URL: https://issues.apache.org/jira/browse/HDFS-3176
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0, 1.0.1
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0

 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch


 Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary 
 bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead 
 call MD5MD5CRC32FileChecksum.readFields().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout


[ 
https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245664#comment-13245664
 ] 

Hudson commented on HDFS-3166:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2058 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2058/])
HDFS-3166. Add timeout to Hftp connections.  Contributed by Daryn Sharp 
(Revision 1309103)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309103
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HsftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLUtils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpURLTimeouts.java


 Hftp connections do not have a timeout
 --

 Key: HDFS-3166
 URL: https://issues.apache.org/jira/browse/HDFS-3166
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 0.23.3, 2.0.0, 3.0.0

 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, 
 HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch


 Hftp connections do not have read timeouts.  This leads to indefinitely hung 
 sockets when there is a network outage during which time the remote host 
 closed the socket.
 This may also affect WebHdfs, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.


[ 
https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245663#comment-13245663
 ] 

Hudson commented on HDFS-3176:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2058 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2058/])
HDFS-3176. Use MD5MD5CRC32FileChecksum.readFields() in JsonUtil .  
Contributed by Kihwal Lee (Revision 1309114)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309114
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java


 JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
 ---

 Key: HDFS-3176
 URL: https://issues.apache.org/jira/browse/HDFS-3176
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0, 1.0.1
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0

 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch


 Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary 
 bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead 
 call MD5MD5CRC32FileChecksum.readFields().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout


[ 
https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245668#comment-13245668
 ] 

Hudson commented on HDFS-3166:
--

Integrated in Hadoop-Common-trunk-Commit #1983 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1983/])
HDFS-3166. Add timeout to Hftp connections.  Contributed by Daryn Sharp 
(Revision 1309103)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309103
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HsftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLUtils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpURLTimeouts.java


 Hftp connections do not have a timeout
 --

 Key: HDFS-3166
 URL: https://issues.apache.org/jira/browse/HDFS-3166
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 0.23.3, 2.0.0, 3.0.0

 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, 
 HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch


 Hftp connections do not have read timeouts.  This leads to indefinitely hung 
 sockets when there is a network outage during which time the remote host 
 closed the socket.
 This may also affect WebHdfs, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.


[ 
https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245667#comment-13245667
 ] 

Hudson commented on HDFS-3176:
--

Integrated in Hadoop-Common-trunk-Commit #1983 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1983/])
HDFS-3176. Use MD5MD5CRC32FileChecksum.readFields() in JsonUtil .  
Contributed by Kihwal Lee (Revision 1309114)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309114
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java


 JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
 ---

 Key: HDFS-3176
 URL: https://issues.apache.org/jira/browse/HDFS-3176
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0, 1.0.1
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0

 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch


 Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary 
 bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead 
 call MD5MD5CRC32FileChecksum.readFields().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3176) JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.


[ 
https://issues.apache.org/jira/browse/HDFS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245681#comment-13245681
 ] 

Hudson commented on HDFS-3176:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1996/])
HDFS-3176. Use MD5MD5CRC32FileChecksum.readFields() in JsonUtil .  
Contributed by Kihwal Lee (Revision 1309114)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309114
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java


 JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
 ---

 Key: HDFS-3176
 URL: https://issues.apache.org/jira/browse/HDFS-3176
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0, 1.0.1
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 1.1.0, 0.23.3, 2.0.0, 3.0.0

 Attachments: hdfs-3176-branch-1.patch, hdfs-3176.patch


 Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary 
 bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead 
 call MD5MD5CRC32FileChecksum.readFields().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3166) Hftp connections do not have a timeout


[ 
https://issues.apache.org/jira/browse/HDFS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245682#comment-13245682
 ] 

Hudson commented on HDFS-3166:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1996/])
HDFS-3166. Add timeout to Hftp connections.  Contributed by Daryn Sharp 
(Revision 1309103)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309103
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HsftpFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLUtils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHftpURLTimeouts.java


 Hftp connections do not have a timeout
 --

 Key: HDFS-3166
 URL: https://issues.apache.org/jira/browse/HDFS-3166
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.2, 0.23.3, 2.0.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 0.23.3, 2.0.0, 3.0.0

 Attachments: HADOOP-8221.branch-1.patch, HADOOP-8221.patch, 
 HADOOP-8221.patch, HDFS-3166.patch, HDFS-3166.patch


 Hftp connections do not have read timeouts.  This leads to indefinitely hung 
 sockets when there is a network outage during which time the remote host 
 closed the socket.
 This may also affect WebHdfs, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer

2012-04-03 Thread Eli Collins (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245686#comment-13245686
]

Eli Collins commented on HDFS-3148:
---

Hey Suresh,

This feature is actually independent of all the other hdfs-3140 sub-tasks, and
multihoming in general, and therefore does not require any further jiras. It
covers using multiple interfaces on the *client* side, the others are all about
using multiple jiras on the *server* side. These can both be used
independently, eg it's just as valuable to use multiple local interfaces on the
client side even if you don't use multihoming on the server side. Happy to
pull it out to it's own top-level jira if that's more clear. Ditto, lemme know
if you think the other HDFS-3140 jiras should be in a branch. Just enabling
multihoming requires HDFS-3146 and HDFS-3147 and a branch for a couple jiras
felt like overkill. Much of the work has been in the cleanup of DatanodeID and
friends.

Thanks,
Eli

The client should be able to use multiple local interfaces for data transfer

Key: HDFS-3148
URL: https://issues.apache.org/jira/browse/HDFS-3148
Project: Hadoop HDFS
Issue Type: Sub-task
Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins
Fix For: 1.1.0, 2.0.0

Attachments: hdfs-3148-b1.txt, hdfs-3148-b1.txt, hdfs-3148.txt,
hdfs-3148.txt, hdfs-3148.txt

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs


[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245691#comment-13245691
 ] 

Todd Lipcon commented on HDFS-3077:
---

bq. Terminology - JournalDaemon or JournalNode. I prefer JournalDaemon because 
my plan was to run them in the same process space as the namenode. A 
JournalDeamon could also be stand-alone process.

I prefer JournalNode because every other daemon we have is a *Node. If you're 
running it inside another process, I think we would just call it a 
JournalService -- or an embedded JournalNode. I think of a daemon as a 
standalone process.

bq. I like the idea of quorum writes and maintaining the queue. 3092 design 
currently uses timeout to declare a JD slow and fail it. We were planning to 
punting on it until we had first implementation.

OK. This part I have done in the patch attached here and works pretty well, so 
far. If you want, I'm happy to separate out the quorum completion code to 
commit it ASAP so we can share code here.

bq. newEpoch() is called fence() in HDFS-3092. My preference is to use the name 
fence(). I was using version # which is called epoch. I think the name epoch 
sounds better. The key difference is that version # is generated from znode in 
HDFS-3092.

As I had commented earlier on this ticket, I originally was planning to do 
something similar to you, bootstrapping off of ZK to generate epoch numbers. 
But then, when I got into coding, I realized that this algorithm is actually 
not so hard to implement, and adding a dependency on ZK actually adds to the 
combinatorics of things to think about. I think the standalone nature of the 
approach outweighs what benefit we might get by reusing ZK.

bq. So two namenodes cannot use the same epoch number. I think there is a bug 
with the approach you have described, stemming from the fact that two namenodes 
can use the same epoch and step 3 in 2.4 can be completed independent of 
quorum. This is shown in Hari's example.

How can step 3 in section 2.4 be completed independent of quorum? Step 4 
indicates that it requires a quorum of nodes to respond successfully to the 
{{newEpoch}} message. Here's an example:

Initial state:
||Node||lastPromisedEpoch||
|JN1|1|
|JN2|1|
|JN3|1|

1. Two NNs (NN1 and NN2) enter step 1 concurrently. They both receive responses 
indicating {{lastPromisedEpoch==1}} from all of the JNs.
2. They both propose {{newEpoch(2)}}. The behavior of the JN ensures that it 
will only respond success to either NN1 or NN2, but not both (since it will 
fail if the proposedEpoch = lastPromisedEpoch)
So, either NN1 or NN2 gets success from a majority. The other node will only 
get success from a minority, and thus will abort.

Note that with message losses or failures, it's possible for _neither_ of the 
nodes to get a quorum in the case of a race. That's OK, since we expect that an 
external leader election framework will eventually assist such that only one NN 
is trying to become active, and then that NN will win.

Note that the epoch algorithm is cribbed from ZAB, see page 7 of Yahoo tech 
report YL-2010-0007. The mapping from ZAB terminology is:
||ZAB term||QJournal term||
|CEPOCH(e)|Response to getLastPromisedEpoch()|
|NEWEPOCH(e')|newEpoch(proposedEpoch)|
|ACK-E(...)|success response to newEpoch()|


 Quorum-based protocol for reading and writing edit logs
 ---

 Key: HDFS-3077
 URL: https://issues.apache.org/jira/browse/HDFS-3077
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha, name-node
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3077-partial.txt, qjournal-design.pdf


 Currently, one of the weak points of the HA design is that it relies on 
 shared storage such as an NFS filer for the shared edit log. One alternative 
 that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
 which provides a highly available replicated edit log on commodity hardware. 
 This JIRA is to implement another alternative, based on a quorum commit 
 protocol, integrated more tightly in HDFS and with the requirements driven 
 only by HDFS's needs rather than more generic use cases. More details to 
 follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-03 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.002.patch

* add unit test

* some fixes to NN unclean shutdown (to allow unit test to work)

* better error reporting for the branch-1 edit log stuff (print out the offset 
when we encounter a problem)

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3185) Setup configuration for Journal Manager and Journal Services

2012-04-03 Thread Hari Mankude (Created) (JIRA)

Setup configuration for Journal Manager and Journal Services


 Key: HDFS-3185
 URL: https://issues.apache.org/jira/browse/HDFS-3185
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Hari Mankude
Assignee: Hari Mankude




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3186) Sync lagging journal service from the active journal service

2012-04-03 Thread Hari Mankude (Created) (JIRA)

Sync lagging journal service from the active journal service


 Key: HDFS-3186
 URL: https://issues.apache.org/jira/browse/HDFS-3186
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Hari Mankude




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3187) Upgrade guava to 11.0.2


 [ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3187:
--

Attachment: hdfs-3187.txt

Attached patch upgrades guava in the pom, and also fixes two calls to methods 
that have been removed in this version of guava. Unfortunately the QA bot won't 
be able to run this patch since it changes the top-level pom.

 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3186) Sync lagging journal service from the active journal service

2012-04-03 Thread Hari Mankude (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude reassigned HDFS-3186:
--

Assignee: Hari Mankude

 Sync lagging journal service from the active journal service
 

 Key: HDFS-3186
 URL: https://issues.apache.org/jira/browse/HDFS-3186
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3187) Upgrade guava to 11.0.2

Upgrade guava to 11.0.2
---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3187.txt

Guava r11 includes some nice features which we'd like to use in the 
implementation of HDFS-3077. In particular, 
{{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to be 
turned into a {{ListeningExecutorService}}, so that tasks can be submitted to 
it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3187) Upgrade guava to 11.0.2


 [ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3187:
--

Status: Patch Available  (was: Open)

 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond

Add infrastructure for waiting for a quorum of ListenableFutures to respond
---

 Key: HDFS-3188
 URL: https://issues.apache.org/jira/browse/HDFS-3188
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As 
described in the design document, this class allows a set of ListenableFutures 
to be wrapped, and the caller can wait for a specific number of responses, or a 
timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2

[
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245721#comment-13245721
]

Hadoop QA commented on HDFS-3187:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12521212/hdfs-3187.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2171//console

This message is automatically generated.

Upgrade guava to 11.0.2
---

Key: HDFS-3187
URL: https://issues.apache.org/jira/browse/HDFS-3187
Project: Hadoop HDFS
Issue Type: Sub-task
Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
Attachments: hdfs-3187.txt

Guava r11 includes some nice features which we'd like to use in the
implementation of HDFS-3077. In particular,
{{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to
be turned into a {{ListeningExecutorService}}, so that tasks can be submitted
to it and then wrapped as {{ListenableFuture}}s.

[jira] [Updated] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond


 [ 
https://issues.apache.org/jira/browse/HDFS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3188:
--

Attachment: hdfs-3188.txt

Attached patch implements QuorumCall as described, and includes a unit test.

 Add infrastructure for waiting for a quorum of ListenableFutures to respond
 ---

 Key: HDFS-3188
 URL: https://issues.apache.org/jira/browse/HDFS-3188
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3188.txt


 This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As 
 described in the design document, this class allows a set of 
 ListenableFutures to be wrapped, and the caller can wait for a specific 
 number of responses, or a timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3189) Add preliminary QJournalProtocol interface, translators

Add preliminary QJournalProtocol interface, translators
---

 Key: HDFS-3189
 URL: https://issues.apache.org/jira/browse/HDFS-3189
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


This JIRA is to add the preliminary code for the QJournalProtocol. This 
protocol differs from JournalProtocol in the following ways:
- each call has context information indicating the epoch number of the requester
- it contains calls that are specific to epoch number generation, etc, which do 
not apply to other journaling daemons such as the BackupNode

My guess is that, at some point, we can merge back down to one protocol, but 
during the initial implementation phase, it will be useful to have a distinct 
protocol for this project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond


 [ 
https://issues.apache.org/jira/browse/HDFS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3188:
--

Status: Patch Available  (was: Open)

 Add infrastructure for waiting for a quorum of ListenableFutures to respond
 ---

 Key: HDFS-3188
 URL: https://issues.apache.org/jira/browse/HDFS-3188
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3188.txt


 This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As 
 described in the design document, this class allows a set of 
 ListenableFutures to be wrapped, and the caller can wait for a specific 
 number of responses, or a timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245729#comment-13245729
]

Bikas Saha commented on HDFS-3077:
--

Nice doc! Greatly sped up understanding the design instead of having to grok it
from the patch :)

I think it will help clarify the doc, if you add the explanation for Hari's
example. Even though epoch 2 is persisted on JN1, its last log segment is still
tied to epoch 1 and it needs to sync its last log segment with JN2/JN3. Are you
proposing that JN1 drop its last edits in progress and pick up the
corresponding finalized segment from JN1/JN2. Or is it TBD?

Btw, there is some new code here but there seems to be some code in existing NN
that changes the sequential journal sync to parallel (based on reading your doc
and not your patch). I am guessing there will be other significant changes
going forward. Are you planning on committing this to a branch or directly to
trunk?

Quorum-based protocol for reading and writing edit logs
---

[jira] [Updated] (HDFS-3189) Add preliminary QJournalProtocol interface, translators


 [ 
https://issues.apache.org/jira/browse/HDFS-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3189:
--

Attachment: hdfs-3189-prelim.txt

Preliminary patch, there are still a couple TODOs/cleanup to do before this is 
committable.

 Add preliminary QJournalProtocol interface, translators
 ---

 Key: HDFS-3189
 URL: https://issues.apache.org/jira/browse/HDFS-3189
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3189-prelim.txt


 This JIRA is to add the preliminary code for the QJournalProtocol. This 
 protocol differs from JournalProtocol in the following ways:
 - each call has context information indicating the epoch number of the 
 requester
 - it contains calls that are specific to epoch number generation, etc, which 
 do not apply to other journaling daemons such as the BackupNode
 My guess is that, at some point, we can merge back down to one protocol, but 
 during the initial implementation phase, it will be useful to have a distinct 
 protocol for this project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension

Simple refactors in existing NN code to assist QuorumJournalManager extension
-

 Key: HDFS-3190
 URL: https://issues.apache.org/jira/browse/HDFS-3190
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Priority: Minor


This JIRA is for some simple refactors in the NN:
- refactor the code which writes the seen_txid file in NNStorage into a new 
LongContainingFile utility class. This is useful for the JournalNode to 
atomically/durably record its last promised epoch
- refactor the interface from FileJournalManager back to StorageDirectory to 
use a StorageErrorReport interface. This allows FileJournalManager to be used 
in isolation of a full StorageDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245736#comment-13245736
]

Todd Lipcon commented on HDFS-3077:
---

bq. I think it will help clarify the doc, if you add the explanation for Hari's
example. Even though epoch 2 is persisted on JN1, its last log segment is still
tied to epoch 1 and it needs to sync its last log segment with JN2/JN3. Are you
proposing that JN1 drop its last edits in progress and pick up the
corresponding finalized segment from JN1/JN2. Or is it TBD?

Yes, I think it would see that its copy of the segment is out of date
epoch-wise, delete it, and then copy the finalized segments from the other
nodes later. I'll try to expand upon this portion of the doc in the coming days.

I also have another idea which may be slightly simpler -- Suresh got me
thinking about it a bit. Basically the idea is that, instead of deleting empty
edit logs, we could fill them in with a single NOOP transaction. Let me think
on this for a little while and then update the design doc if it turns out to
work.

bq. Btw, there is some new code here but there seems to be some code in
existing NN that changes the sequential journal sync to parallel (based on
reading your doc and not your patch).

Nope, the thinking is that all of the new code will be encapsulated by
QuorumJournalManager. So, from the NN's perspective, there is only a single
edit log. It happens that that edit log is distributed and fault-tolerant
underneath, but the NN would see it as a single required journal, and crash
if it fails to sync.

bq. Are you planning on committing this to a branch or directly to trunk?

I'm happy to do either. Suresh seemed to think doing it on a branch would be
counter-productive to code sharing. In practice it's almost new code, so as
long as we're clear to mark it in-progress or experimental, I don't think
it would be destabilizing to do in trunk. HDFS-3190 is the one place in which
I've modified NN code, but only trivially.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Updated] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension


 [ 
https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3190:
--

Attachment: hdfs-3190.txt

Simple patch implements the above. Does not add unit tests since it's a 
straight refactor of existing code, and that code is covered by many existing 
tests.

 Simple refactors in existing NN code to assist QuorumJournalManager extension
 -

 Key: HDFS-3190
 URL: https://issues.apache.org/jira/browse/HDFS-3190
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3190.txt


 This JIRA is for some simple refactors in the NN:
 - refactor the code which writes the seen_txid file in NNStorage into a new 
 LongContainingFile utility class. This is useful for the JournalNode to 
 atomically/durably record its last promised epoch
 - refactor the interface from FileJournalManager back to StorageDirectory to 
 use a StorageErrorReport interface. This allows FileJournalManager to be used 
 in isolation of a full StorageDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController


[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245739#comment-13245739
 ] 

Bikas Saha commented on HDFS-2185:
--

I think you are missing the failure arc when transitionToStandby is called in 
InElection state.

Is there any scope for admin operations in ZKFC. Will ZKFC receive and accept a 
signal (manual admin/auto machine reboot) to stop services? At that point, in 
InElection state, how will it know that it needs to send transitionToStandby or 
not (based on whether it is active or not)?


 HA: HDFS portion of ZK-based FailoverController
 ---

 Key: HDFS-2185
 URL: https://issues.apache.org/jira/browse/HDFS-2185
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: auto-failover, ha
Affects Versions: 0.24.0, 0.23.3
Reporter: Eli Collins
Assignee: Todd Lipcon
 Fix For: Auto failover (HDFS-3042)

 Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, 
 hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, zkfc-design.pdf, 
 zkfc-design.pdf, zkfc-design.pdf, zkfc-design.tex


 This jira is for a ZK-based FailoverController daemon. The FailoverController 
 is a separate daemon from the NN that does the following:
 * Initiates leader election (via ZK) when necessary
 * Performs health monitoring (aka failure detection)
 * Performs fail-over (standby to active and active to standby transitions)
 * Heartbeats to ensure the liveness
 It should have the same/similar interface as the Linux HA RM to aid 
 pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension


 [ 
https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3190:
--

Assignee: Todd Lipcon
  Status: Patch Available  (was: Open)

 Simple refactors in existing NN code to assist QuorumJournalManager extension
 -

 Key: HDFS-3190
 URL: https://issues.apache.org/jira/browse/HDFS-3190
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3190.txt


 This JIRA is for some simple refactors in the NN:
 - refactor the code which writes the seen_txid file in NNStorage into a new 
 LongContainingFile utility class. This is useful for the JournalNode to 
 atomically/durably record its last promised epoch
 - refactor the interface from FileJournalManager back to StorageDirectory to 
 use a StorageErrorReport interface. This allows FileJournalManager to be used 
 in isolation of a full StorageDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245752#comment-13245752
]

Bikas Saha commented on HDFS-3077:
--

bq. Nope, the thinking is that all of the new code will be encapsulated by
QuorumJournalManager. So, from the NN's perspective, there is only a single
edit log. It happens that that edit log is distributed and fault-tolerant
underneath, but the NN would see it as a single required journal, and crash
if it fails to sync.
Got it. So local edits and remote edits would be replaced by a single
qjournaledits.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster


[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245755#comment-13245755
 ] 

Zhanwei.Wang commented on HDFS-3179:


I totally agree with you about the problem one datanode with replication 3，I 
think this kind of operation should fail or at least get a warning.

My opinion is that, the purpose of the policy check is to make sure no 
potential data lose, in this one datanode 3 replica case, although the first 
append failure will not cause the data lose, the appended data after the first 
successful append is in danger because there is only one replica which is not 
the user expected 3. And there is no warning to tell the user the truth. 

My suggestion is to make the first write to the empty file fail if there is not 
enough datanode, in another word, make the policy check more strictly. And make 
the error message more friendly instead of nodes.length != original.length + 
1.




 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension


[ 
https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245765#comment-13245765
 ] 

Bikas Saha commented on HDFS-3190:
--

+1 lgtm.


 Simple refactors in existing NN code to assist QuorumJournalManager extension
 -

 Key: HDFS-3190
 URL: https://issues.apache.org/jira/browse/HDFS-3190
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3190.txt


 This JIRA is for some simple refactors in the NN:
 - refactor the code which writes the seen_txid file in NNStorage into a new 
 LongContainingFile utility class. This is useful for the JournalNode to 
 atomically/durably record its last promised epoch
 - refactor the interface from FileJournalManager back to StorageDirectory to 
 use a StorageErrorReport interface. This allows FileJournalManager to be used 
 in isolation of a full StorageDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-04-03 Thread Suresh Srinivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245771#comment-13245771
]

Suresh Srinivas commented on HDFS-3077:
---

bq. Suresh seemed to think doing it on a branch would be counter-productive to
code sharing
There is a branch already created for 3092. We could use that.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API


[ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245774#comment-13245774
 ] 

Todd Lipcon commented on HDFS-3110:
---

my top comment got chopped somehow above:

- I like the refactoring out of readPrepare and handleReadResult. But, these 
should be declared {{static}}


 libhdfs implementation of direct read API
 -

 Key: HDFS-3110
 URL: https://issues.apache.org/jira/browse/HDFS-3110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 0.24.0

 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch


 Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
 which leads to significant performance increases when reading local data from 
 C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API


[ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245773#comment-13245773
 ] 

Todd Lipcon commented on HDFS-3110:
---

uld be declared {{static}}
- I think your new patch was actually a delta vs the old patch, instead of a 
completely new one vs trunk. We need a new one for QA  commit
- When NewDirectByteBuffer returns NULL with no errno set, I think it's better 
to set {{errno = ENOMEM;}} in an {{else}} clause -- just a little easier to 
read.

- The new flag HDFS_SUPPORTS_DIRECT_READ is only used internally, so not sure 
it belongs in the public header hdfs.h (this is what users include, right?). 
Also, I think it would be better named something like 
{{HDFS_FILE_SUPPORTS_DIRECT_READ}} since it refers to a specific stream rather 
than the entire FS.

- Rather than declaring it as a {{const}} I think it's better to use an enum or 
#define, since consts are a C++ thing and this code is mostly straight C. Also, 
I think it's better to define it as (1  0) to indicate that this is going to 
be in a bitfield.

- Please add a comment above the definition of the new flag referring to 
hdfsFile_internal.flags, so we know where the flags end up.

- the new {{flags}} field should be unsigned -- {{uint32_t}} probably

- in the new test, why are you hardcoding {{localhost:20300}}? I'd think using 
{{default}} as before is the right choice, since it will pick up whatever is 
{{fs.default.name}} in your {{core-site.xml}} on the classpath. That way this 
same test can be run against local FS or against DFS


 libhdfs implementation of direct read API
 -

 Key: HDFS-3110
 URL: https://issues.apache.org/jira/browse/HDFS-3110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 0.24.0

 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch


 Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
 which leads to significant performance increases when reading local data from 
 C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-03 Thread Sanjay Radia (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245776#comment-13245776
 ] 

Sanjay Radia commented on HDFS-3092:


Is there a way to turn off the striping even if Quorom size (Q) is less than 
Ensemble size (E)?
We like the idea that each Journal file contains ALL entries.
Our default config: Q is 2 and set of JDs is 3 (roughly equivalent to E).

 Enable journal protocol based editlog streaming for standby namenode
 

 Key: HDFS-3092
 URL: https://issues.apache.org/jira/browse/HDFS-3092
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, name-node
Affects Versions: 0.24.0, 0.23.3
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: MultipleSharedJournals.pdf


 Currently standby namenode relies on reading shared editlogs to stay current 
 with the active namenode, for namespace changes. BackupNode used streaming 
 edits from active namenode for doing the same. This jira is to explore using 
 journal protocol based editlog streams for the standby namenode. A daemon in 
 standby will get the editlogs from the active and write it to local edits. To 
 begin with, the existing standby mechanism of reading from a file, will 
 continue to be used, instead of from shared edits, from the local edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension

[
https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245778#comment-13245778
]

Todd Lipcon commented on HDFS-3190:
---

thanks Bikas. Quick question for reviewers: when I moved this code, I noticed
the {{canRead()}} check. Currently if the file exists but can't be read, it
returns the default value. I thought this was a little suspicious. Anyone
adverse to removing that check, so that we throw an exception if it exists but
we can't read it? Or better to keep this as a straight refactor and file a
follow-up to think about that?

Simple refactors in existing NN code to assist QuorumJournalManager extension
-

Key: HDFS-3190
URL: https://issues.apache.org/jira/browse/HDFS-3190
Project: Hadoop HDFS
Issue Type: Sub-task
Components: name-node
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
Attachments: hdfs-3190.txt

This JIRA is for some simple refactors in the NN:
- refactor the code which writes the seen_txid file in NNStorage into a new
LongContainingFile utility class. This is useful for the JournalNode to
atomically/durably record its last promised epoch
- refactor the interface from FileJournalManager back to StorageDirectory to
use a StorageErrorReport interface. This allows FileJournalManager to be used
in isolation of a full StorageDirectory.

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs


[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245785#comment-13245785
 ] 

Bikas Saha commented on HDFS-3077:
--

I have a question around syncing journal nodes and quorum based writes. There 
will always be a case that a lost journal node comes back up and is syncing its 
state - the extreme example of which is replacement of a broken journal node 
with a new node.
While it is doing this, will it be part of the quorum when a quorum number of 
writes must succeed?
Say we have 3 journals with the following txids
JN1-100, JN2-100, JN3-0 (JN3 just joined)
Now say some stuff got written to JN2 and JN3 (quorum commit with JN1 in flight 
records in the queue because JN1 is slow)
JN1-100, JN2-110, JN3-110+syncing_holes
At this point something terrible happens and when we recover, we can only 
access JN1 and JN3
JN1-100, JN3-110+syncing holes
At this point of time how do we resolve the ground truth about the journal 
state and edit logs?




 Quorum-based protocol for reading and writing edit logs
 ---

 Key: HDFS-3077
 URL: https://issues.apache.org/jira/browse/HDFS-3077
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha, name-node
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3077-partial.txt, qjournal-design.pdf


 Currently, one of the weak points of the HA design is that it relies on 
 shared storage such as an NFS filer for the shared edit log. One alternative 
 that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
 which provides a highly available replicated edit log on commodity hardware. 
 This JIRA is to implement another alternative, based on a quorum commit 
 protocol, integrated more tightly in HDFS and with the requirements driven 
 only by HDFS's needs rather than more generic use cases. More details to 
 follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245806#comment-13245806
]

Todd Lipcon commented on HDFS-3077:
---

Hi Bikas. Thanks for bringing up this scenario. I do need to add a section to
the doc about failure handling and re-adding failed journals.

My thinking is that the granularity of membership is the log segment. This is
similar to what we do on local disks today - when we roll the edit log, we
attempt to re-add any disks that previously failed. Similarly, when we start a
new log segment, we give all of the JNs a chance to pick back up following
along with the quorum.

To try to map to your example, we'd have the following:
JN1: writing edits_inprogress_1 (@txn 100)
JN2: writing edits_inprogress_1 (@txn 100)
JN3: has been reformatted, comes back online

At this point, the QJM can try to write txns to all three, but JN3 won't accept
transactions because it doesn't have a currently open log segment. Currently it
will just reject them. I can imagine a future optimization in which it would
return a special exception, and the QJM could notify the NN that it would like
to roll ASAP if possible.

Let's say we write another 20 txns, and then roll logs. On the next
startLogSegment call, we'd end up with the following:

JN1: edits_1-120, edits_inprogress_121
JN2: edits_1-120, edits_inprogress_121
JN3: edits_inprogress_121

so all nodes are now taking part in the quorum. We could optionally at this
point have JN3 copy over the edits_1-120 segment from one of the other nodes,
but that copy can be asynchronous. It's a repair operation, but given we
already have 2 valid replicas, we aren't in any imminent danger of data loss.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2

2012-04-03 Thread Eli Collins (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245817#comment-13245817
 ] 

Eli Collins commented on HDFS-3187:
---

+1 looks good

 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3188) Add infrastructure for waiting for a quorum of ListenableFutures to respond

[
https://issues.apache.org/jira/browse/HDFS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245818#comment-13245818
]

Hadoop QA commented on HDFS-3188:
-

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12521216/hdfs-3188.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 2 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/2172//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2172//console

This message is automatically generated.

Add infrastructure for waiting for a quorum of ListenableFutures to respond
---

Key: HDFS-3188
URL: https://issues.apache.org/jira/browse/HDFS-3188
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Attachments: hdfs-3188.txt

This JIRA adds the {{QuorumCall}} class which is used in HDFS-3077. As
described in the design document, this class allows a set of
ListenableFutures to be wrapped, and the caller can wait for a specific
number of responses, or a timeout.

[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2


[ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245827#comment-13245827
 ] 

Todd Lipcon commented on HDFS-3187:
---

Thanks Eli. I double checked that all the MR, HDFS, and Common tests and code 
still compile with this change. I didn't run the full suite, but the new guava 
release is compatible with the old aside from the {{Files}} changes I dealt 
with in the patch. Will commit momentarily

 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3187) Upgrade guava to 11.0.2


 [ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3187:
--

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: 2.0.0

 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-04-03 Thread Suresh Srinivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245847#comment-13245847
]

Suresh Srinivas commented on HDFS-3077:
---

bq. How can step 3 in section 2.4 be completed independent of quorum? Step 4
indicates that it requires a quorum of nodes to respond successfully to the
newEpoch message. Here's an example:
What I meant was at each JN, step 3 completes. Hence the example Hari was
giving.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-04-03 Thread Suresh Srinivas (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245850#comment-13245850
]

Suresh Srinivas commented on HDFS-3077:
---

bq. so all nodes are now taking part in the quorum. We could optionally at this
point have JN3 copy over the edits_1-120 segment from one of the other nodes,
but that copy can be asynchronous. It's a repair operation, but given we
already have 2 valid replicas, we aren't in any imminent danger of data loss.
The proposal in HDFS-3092 is to make the JN3 part of the quorum, only when it
has caught up with other JNs. Having this simplify some boundary conditions.

Quorum-based protocol for reading and writing edit logs
---

[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2


[ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245854#comment-13245854
 ] 

Hudson commented on HDFS-3187:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2060 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2060/])
HDFS-3187. Upgrade guava to 11.0.2. Contributed by Todd Lipcon. (Revision 
1309181)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309181
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
* /hadoop/common/trunk/hadoop-project/pom.xml


 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: 2.0.0

 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2

2012-04-03 Thread Colin Patrick McCabe (Reopened) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245857#comment-13245857
 ] 

Hudson commented on HDFS-3187:
--

Integrated in Hadoop-Common-trunk-Commit #1985 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1985/])
HDFS-3187. Upgrade guava to 11.0.2. Contributed by Todd Lipcon. (Revision 
1309181)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309181
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
* /hadoop/common/trunk/hadoop-project/pom.xml


 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: 2.0.0

 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3110) libhdfs implementation of direct read API

2012-04-03 Thread Henry Robinson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3110:
-

Attachment: HDFS-3110.3.patch

New patch that's actually a diff vs trunk this time :/

I incorporated most of Todd's suggestions. I've left 
HDFS_FILE_SUPPORTS_DIRECT_READ in hdfs.h for now so that users who *really* 
want to turn off support for some reason (perhaps a bug) have access to the 
flag that they can set in hdfsFile's guts. 

I ran the tests against the default local filesystem when no fs.default.name is 
set, and observed no errors except that the tests expect readDirect to be 
available.

 libhdfs implementation of direct read API
 -

 Key: HDFS-3110
 URL: https://issues.apache.org/jira/browse/HDFS-3110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 0.24.0

 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
 HDFS-3110.3.patch


 Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
 which leads to significant performance increases when reading local data from 
 C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-1378) Edit log replay should track and report file offsets in case of errors


 [ 
https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reopened HDFS-1378:


  Assignee: Colin Patrick McCabe  (was: Aaron T. Myers)

I'd like to port this to branch-1 so that we can have better error messages 
there.  It should be a trivial port.  Any objections?

 Edit log replay should track and report file offsets in case of errors
 --

 Key: HDFS-1378
 URL: https://issues.apache.org/jira/browse/HDFS-1378
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
 Fix For: 0.23.0

 Attachments: hdfs-1378-branch20.txt, hdfs-1378.0.patch, 
 hdfs-1378.1.patch, hdfs-1378.2.txt


 Occasionally there are bugs or operational mistakes that result in corrupt 
 edit logs which I end up having to repair by hand. In these cases it would be 
 very handy to have the error message also print out the file offsets of the 
 last several edit log opcodes so it's easier to find the right place to edit 
 in the OP_INVALID marker. We could also use this facility to provide a rough 
 estimate of how far along edit log replay the NN is during startup (handy 
 when a 2NN has died and replay takes a while)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1378) Edit log replay should track and report file offsets in case of errors

2012-04-03 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-1378:
---

Attachment: HDFS-1378-b1.002.patch

* port to branch-1

 Edit log replay should track and report file offsets in case of errors
 --

 Key: HDFS-1378
 URL: https://issues.apache.org/jira/browse/HDFS-1378
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
 Fix For: 0.23.0

 Attachments: HDFS-1378-b1.002.patch, hdfs-1378-branch20.txt, 
 hdfs-1378.0.patch, hdfs-1378.1.patch, hdfs-1378.2.txt


 Occasionally there are bugs or operational mistakes that result in corrupt 
 edit logs which I end up having to repair by hand. In these cases it would be 
 very handy to have the error message also print out the file offsets of the 
 last several edit log opcodes so it's easier to find the right place to edit 
 in the OP_INVALID marker. We could also use this facility to provide a rough 
 estimate of how far along edit log replay the NN is during startup (handy 
 when a 2NN has died and replay takes a while)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3187) Upgrade guava to 11.0.2


[ 
https://issues.apache.org/jira/browse/HDFS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245867#comment-13245867
 ] 

Hudson commented on HDFS-3187:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1998 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1998/])
HDFS-3187. Upgrade guava to 11.0.2. Contributed by Todd Lipcon. (Revision 
1309181)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1309181
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
* /hadoop/common/trunk/hadoop-project/pom.xml


 Upgrade guava to 11.0.2
 ---

 Key: HDFS-3187
 URL: https://issues.apache.org/jira/browse/HDFS-3187
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: 2.0.0

 Attachments: hdfs-3187.txt


 Guava r11 includes some nice features which we'd like to use in the 
 implementation of HDFS-3077. In particular, 
 {{MoreExecutors.listeningDecorator}} allows a normal {{ExecutorService}} to 
 be turned into a {{ListeningExecutorService}}, so that tasks can be submitted 
 to it and then wrapped as {{ListenableFuture}}s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension