[jira] [Created] (HDFS-7523) Setting a socket receive buffer size in DFSClient

2014-12-15 Thread Liang Xie (JIRA)
Liang Xie created HDFS-7523:
---

 Summary: Setting a socket receive buffer size in DFSClient
 Key: HDFS-7523
 URL: https://issues.apache.org/jira/browse/HDFS-7523
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie


It would be nice if we have a socket receive buffer size while creating socket 
from client(HBase) view, in old version it should be in DFSInputStream, in 
trunk it seems should be at:
{code}
  @Override // RemotePeerFactory
  public Peer newConnectedPeer(InetSocketAddress addr,
  TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId)
  throws IOException {
Peer peer = null;
boolean success = false;
Socket sock = null;
try {
  sock = socketFactory.createSocket();
  NetUtils.connect(sock, addr,
getRandomLocalInterfaceAddr(),
dfsClientConf.socketTimeout);
  peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this,
  blockToken, datanodeId);
  peer.setReadTimeout(dfsClientConf.socketTimeout);
{code}

e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);

the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, 
this value sometimes is small for HBase 64k block reading in a 10G network(at 
least, more system call)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed

2014-12-15 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246436#comment-14246436
 ] 

Rakesh R commented on HDFS-7414:


Yeah [~brahmareddy], edit log viewer shows the same. I'm suspecting the could 
be chances of the following:

 Namenode got shutdown and can't recover where edit update might be missed
 -

 Key: HDFS-7414
 URL: https://issues.apache.org/jira/browse/HDFS-7414
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Blocker

 Scenario:
 
 Was running mapreduce job.
 CPU usage crossed 190% for Datanode and machine became slow..
 and seen the following exception .. 
  *Did not get the exact root cause, but as cpu usage more edit log updation 
 might be missed...Need dig to more...anyone have any thoughts.* 
 {noformat}
 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation 
 CloseOp [length=0, inodeId=0, 
 path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025,
  replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, 
 blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], 
 permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, 
 clientMachine=, opCode=OP_CLOSE, txid=162982] | 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459)
 2014-11-20 05:01:18,654 | WARN  | main | Encountered exception loading 
 fsimage | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741)
 at 
 

[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed

2014-12-15 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246440#comment-14246440
 ] 

Rakesh R commented on HDFS-7414:


Yeah Brahma, edit log viewer shows the same. I'm suspecting the chances of 
occurring the below operations concurrently. Let me try to reproduce the same.

operation-1) internal lease releases occurred and initialize block recovery. 
This will add the OP_CLOSE entry.
operation-2) client deleted the file. This will add OP_DELETE entry

 Namenode got shutdown and can't recover where edit update might be missed
 -

 Key: HDFS-7414
 URL: https://issues.apache.org/jira/browse/HDFS-7414
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Blocker

 Scenario:
 
 Was running mapreduce job.
 CPU usage crossed 190% for Datanode and machine became slow..
 and seen the following exception .. 
  *Did not get the exact root cause, but as cpu usage more edit log updation 
 might be missed...Need dig to more...anyone have any thoughts.* 
 {noformat}
 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation 
 CloseOp [length=0, inodeId=0, 
 path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025,
  replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, 
 blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], 
 permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, 
 clientMachine=, opCode=OP_CLOSE, txid=162982] | 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459)
 2014-11-20 05:01:18,654 | WARN  | main | Encountered exception loading 
 fsimage | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)

[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed

2014-12-15 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246451#comment-14246451
 ] 

Vinayakumar B commented on HDFS-7414:
-

Looks like you hit the HDFS-6825 which was also due to extra OP_CLOSE edits. 
Check whether you got the stacktrace mentioned in HDFS-6825 to confirm the same.

 Namenode got shutdown and can't recover where edit update might be missed
 -

 Key: HDFS-7414
 URL: https://issues.apache.org/jira/browse/HDFS-7414
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Blocker

 Scenario:
 
 Was running mapreduce job.
 CPU usage crossed 190% for Datanode and machine became slow..
 and seen the following exception .. 
  *Did not get the exact root cause, but as cpu usage more edit log updation 
 might be missed...Need dig to more...anyone have any thoughts.* 
 {noformat}
 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation 
 CloseOp [length=0, inodeId=0, 
 path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025,
  replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, 
 blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], 
 permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, 
 clientMachine=, opCode=OP_CLOSE, txid=162982] | 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459)
 2014-11-20 05:01:18,654 | WARN  | main | Encountered exception loading 
 fsimage | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575)
 at 
 

[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed

2014-12-15 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246471#comment-14246471
 ] 

Rakesh R commented on HDFS-7414:


Hi Vinay, By seeing the [HDFS-6825 comment 
|https://issues.apache.org/jira/browse/HDFS-6825?focusedCommentId=14098682page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14098682],
 it looks like similar case.

 Namenode got shutdown and can't recover where edit update might be missed
 -

 Key: HDFS-7414
 URL: https://issues.apache.org/jira/browse/HDFS-7414
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Blocker

 Scenario:
 
 Was running mapreduce job.
 CPU usage crossed 190% for Datanode and machine became slow..
 and seen the following exception .. 
  *Did not get the exact root cause, but as cpu usage more edit log updation 
 might be missed...Need dig to more...anyone have any thoughts.* 
 {noformat}
 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation 
 CloseOp [length=0, inodeId=0, 
 path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025,
  replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, 
 blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], 
 permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, 
 clientMachine=, opCode=OP_CLOSE, txid=162982] | 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459)
 2014-11-20 05:01:18,654 | WARN  | main | Encountered exception loading 
 fsimage | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642)
 java.io.FileNotFoundException: File does not exist: 
 /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575)
 at 

[jira] [Updated] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails

2014-12-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-7471:
-
  Component/s: test
 Priority: Major  (was: Minor)
Affects Version/s: 3.0.0

uprating to major as this is currently the sole test blocking jenkins HDFS runs

 TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
 -

 Key: HDFS-7471
 URL: https://issues.apache.org/jira/browse/HDFS-7471
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Ted Yu

 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ :
 {code}
 FAILED:  
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Error Message:
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 237 expected:0 but was:1
 Stack Trace:
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 237 
 expected:0 but was:1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7480) Namenodes loops on 'block does not belong to any file' after deleting many files

2014-12-15 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246605#comment-14246605
 ] 

Frode Halvorsen commented on HDFS-7480:
---

I will test when 2.6.1 is released..

 Namenodes loops on 'block does not belong to any file' after deleting many 
 files
 

 Key: HDFS-7480
 URL: https://issues.apache.org/jira/browse/HDFS-7480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
 Environment: CentOS - HDFS-HA (journal), zookeeper
Reporter: Frode Halvorsen

 A small cluster has 8 servers with 32 G RAM.
 Two is namenodes (HA-configured), six is Datanodes (8x3 TB disks configured 
 with RAID as one 21 TB drive).
 The cluster recieves avg 400.000 small files each day. I started archiving 
 (HAR) each day as separate archives. After deleting the orinigal files for 
 one month, the namenodes stared acting up really bad.
 When restaring those, both active and passive nodes seems to work OK for some 
 time, but then starts to report a lot of blocks belonging to no files, and 
 the name-node just spins those messages in a massive loop. If the passive 
 node is first, it also influences the active node in susch a way that it's no 
 longer possible to archive new files. If the active node also starts in this 
 loop, it suddenly dies without any error-message.
 The only way I'm able to get rid of the problem, is to start decommission 
 nodes, watching the cluster closely to avoid downtime, and make sure every 
 datanode gets a 'clean' start. After all datanodes has been decommisioned (in 
 turns), and restarted with clean disks, the problem is gone. But if I then 
 delete a lot of files in a short time, the problem starts again...  
 The main problem (I think), is that the recieving and reporting of those 
 blocks takes so many resources, that the namenodes is too busy to tell the 
 datanodes to delete those blocks.. 
 If the active name-node starts on the loop, it does the 'right' thing by 
 telling the datanode to invalidate the block, But the amount of blocks is so 
 massive, that the namenode doesn't do anything else. Just now, I have about 
 1200-1400 log-entries pr second in the passive node.
 update :
 Just got the active namenode in the loop - it logs 1000 lines pr second. 
 500 'BlockStateChange: BLOCK* processReport: blk_1080796332_7056241 on 
 x.x.x.x:50010 size 1742 does not belong to any file'
 and 
 500 ' BlockStateChange: BLOCK* InvalidateBlocks: add blk_1080796332_7056241 
 to x.x.x.x:50010'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246672#comment-14246672
 ] 

Kihwal Lee commented on HDFS-6425:
--

[~mingma] The patch looks good, but does not apply to trunk any more. Can you 
refresh it?

 Large postponedMisreplicatedBlocks has impact on blockReport latency
 

 Key: HDFS-6425
 URL: https://issues.apache.org/jira/browse/HDFS-6425
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, 
 HDFS-6425.patch


 Sometimes we have large number of over replicates when NN fails over. When 
 the new active NN took over, over replicated blocks will be put to 
 postponedMisreplicatedBlocks until all DNs for that block aren't stale 
 anymore.
 We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
 became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
 just kept increasing until the cluster is stable. 
 In addition, large postponedMisreplicatedBlocks could make 
 rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
 takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246679#comment-14246679
 ] 

Hadoop QA commented on HDFS-6425:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661072/HDFS-6425-2.patch
  against trunk revision fae3e86.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9037//console

This message is automatically generated.

 Large postponedMisreplicatedBlocks has impact on blockReport latency
 

 Key: HDFS-6425
 URL: https://issues.apache.org/jira/browse/HDFS-6425
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, 
 HDFS-6425.patch


 Sometimes we have large number of over replicates when NN fails over. When 
 the new active NN took over, over replicated blocks will be put to 
 postponedMisreplicatedBlocks until all DNs for that block aren't stale 
 anymore.
 We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
 became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
 just kept increasing until the cluster is stable. 
 In addition, large postponedMisreplicatedBlocks could make 
 rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
 takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246808#comment-14246808
 ] 

Kihwal Lee commented on HDFS-6425:
--

Did you have a chance to analyze the cause of the large number of 
over-replication?  It might be due to the race between completeFile and 
incremental block reports. If a file is closed with just min_replicas and the 
replication monitor runs before all the rest of incremental block reports are 
received, replication will be scheduled and this will lead to over-replication. 

 Large postponedMisreplicatedBlocks has impact on blockReport latency
 

 Key: HDFS-6425
 URL: https://issues.apache.org/jira/browse/HDFS-6425
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, 
 HDFS-6425.patch


 Sometimes we have large number of over replicates when NN fails over. When 
 the new active NN took over, over replicated blocks will be put to 
 postponedMisreplicatedBlocks until all DNs for that block aren't stale 
 anymore.
 We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
 became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
 just kept increasing until the cluster is stable. 
 In addition, large postponedMisreplicatedBlocks could make 
 rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
 takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7503) Namenode restart after large deletions can cause slow processReport (due to logging)

2014-12-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246809#comment-14246809
 ] 

Suresh Srinivas commented on HDFS-7503:
---

A minor nit:
{code}
+blockLog.info(BLOCK* processReport:  + b +  on  + node
+  +  size  + b.getNumBytes()
+  +  does not belong to any file);
{code}
We can print the repetitive node information and the information that block 
does not belong to any file outside the for loop. 

 Namenode restart after large deletions can cause slow processReport (due to 
 logging)
 

 Key: HDFS-7503
 URL: https://issues.apache.org/jira/browse/HDFS-7503
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.1, 2.6.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 1.3.0, 2.6.1

 Attachments: HDFS-7503.branch-1.02.patch, HDFS-7503.branch-1.patch, 
 HDFS-7503.trunk.01.patch, HDFS-7503.trunk.02.patch


 If a large directory is deleted and namenode is immediately restarted, there 
 are a lot of blocks that do not belong to any file. This results in a log:
 {code}
 2014-11-08 03:11:45,584 INFO BlockStateChange 
 (BlockManager.java:processReport(1901)) - BLOCK* processReport: 
 blk_1074250282_509532 on 172.31.44.17:1019 size 6 does not belong to any file.
 {code}
 This log is printed within FSNamsystem lock. This can cause namenode to take 
 long time in coming out of safemode.
 One solution is to downgrade the logging level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2014-12-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246822#comment-14246822
 ] 

Hari Sekhon commented on HDFS-5442:
---

MapR's approach to DR is perhaps the best in the Hadoop world right now. 
MapR-FS takes snapshots and replicates those snapshots to the other site. When 
the snapshot is fully copied then it is atomically enabled at the other site.

This is the best possible scenario for consistency and has worked well 
including in-built scheduling.

So perhaps HDFS DR requires a 2 administrative options for DR depending on what 
is required:

1. Streaming continuous block replication (inconsistent unless you guarantee 
block write ordering which WANdisco does not)
2. Atomic snapshot mirroring + enabling at other site like MapR-FS

I suspect number 2 will require some improvement to the HDFS snapshots to allow 
rolling forward a snapshot at the DR site once it's complete?

Also, number 2 also allows for schedule changes, ie snap copy every 15 mins or 
every 1 hour or every 1 day so you only get the net changes and not every 
single intermediate change, which may mean less data copied (although I doubt 
that in practice unless people are rewriting/replacing datasets like HBase 
compactions).

Regardless of the solution, there must be configurable path exclusions such as 
for /tmp and other places of intemediate data.

 Zero loss HDFS data replication for multiple datacenters
 

 Key: HDFS-5442
 URL: https://issues.apache.org/jira/browse/HDFS-5442
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Avik Dey
Assignee: Dian Fu
 Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
 Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf


 Hadoop is architected to operate efficiently at scale for normal hardware 
 failures within a datacenter. Hadoop is not designed today to handle 
 datacenter failures. Although HDFS is not designed for nor deployed in 
 configurations spanning multiple datacenters, replicating data from one 
 location to another is common practice for disaster recovery and global 
 service availability. There are current solutions available for batch 
 replication using data copy/export tools. However, while providing some 
 backup capability for HDFS data, they do not provide the capability to 
 recover all your HDFS data from a datacenter failure and be up and running 
 again with a fully operational Hadoop cluster in another datacenter in a 
 matter of minutes. For disaster recovery from a datacenter failure, we should 
 provide a fully distributed, zero data loss, low latency, high throughput and 
 secure HDFS data replication solution for multiple datacenter setup.
 Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6485) Transfer data from primary cluster to mirror cluster synchronously

2014-12-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246824#comment-14246824
 ] 

Hari Sekhon commented on HDFS-6485:
---

Would this be covered by HDFS-5442?

 Transfer data from primary cluster to mirror cluster synchronously
 --

 Key: HDFS-6485
 URL: https://issues.apache.org/jira/browse/HDFS-6485
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jiang, Wenjie
Assignee: Jiang, Wenjie
 Attachments: HDFS-6485.patch


 In the sync mode of Disaster Recovery, namenode in the primary cluster will 
 return a pipeline including datanodes both in primary and mirror clusters to 
 DFSClient and then DFSClient will write data with the existing HDFS 
 architecture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2014-12-15 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246829#comment-14246829
 ] 

Uma Maheswara Rao G commented on HDFS-5442:
---

I am in business travel to China from 13th to 21nd Dec. Please allow my delayed 
responses during this period.



 Zero loss HDFS data replication for multiple datacenters
 

 Key: HDFS-5442
 URL: https://issues.apache.org/jira/browse/HDFS-5442
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Avik Dey
Assignee: Dian Fu
 Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
 Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf


 Hadoop is architected to operate efficiently at scale for normal hardware 
 failures within a datacenter. Hadoop is not designed today to handle 
 datacenter failures. Although HDFS is not designed for nor deployed in 
 configurations spanning multiple datacenters, replicating data from one 
 location to another is common practice for disaster recovery and global 
 service availability. There are current solutions available for batch 
 replication using data copy/export tools. However, while providing some 
 backup capability for HDFS data, they do not provide the capability to 
 recover all your HDFS data from a datacenter failure and be up and running 
 again with a fully operational Hadoop cluster in another datacenter in a 
 matter of minutes. For disaster recovery from a datacenter failure, we should 
 provide a fully distributed, zero data loss, low latency, high throughput and 
 secure HDFS data replication solution for multiple datacenter setup.
 Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2014-12-15 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246835#comment-14246835
 ] 

Konstantin Boudnik commented on HDFS-5442:
--

bq. MapR's approach to DR is perhaps the best in the Hadoop world right now. 
MapR-FS takes snapshots and replicates those snapshots to the other site.
It's hardly the best, because the snapshots are by definition aren't real-time, 
so your DR side is always behind of the primary. And in case of a disastrous 
event you're going to loose not-yet-snapshot'ed data or data-in-flight. 

 Zero loss HDFS data replication for multiple datacenters
 

 Key: HDFS-5442
 URL: https://issues.apache.org/jira/browse/HDFS-5442
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Avik Dey
Assignee: Dian Fu
 Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
 Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf


 Hadoop is architected to operate efficiently at scale for normal hardware 
 failures within a datacenter. Hadoop is not designed today to handle 
 datacenter failures. Although HDFS is not designed for nor deployed in 
 configurations spanning multiple datacenters, replicating data from one 
 location to another is common practice for disaster recovery and global 
 service availability. There are current solutions available for batch 
 replication using data copy/export tools. However, while providing some 
 backup capability for HDFS data, they do not provide the capability to 
 recover all your HDFS data from a datacenter failure and be up and running 
 again with a fully operational Hadoop cluster in another datacenter in a 
 matter of minutes. For disaster recovery from a datacenter failure, we should 
 provide a fully distributed, zero data loss, low latency, high throughput and 
 secure HDFS data replication solution for multiple datacenter setup.
 Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7524) TestRetryCacheWithHA.testUpdatePipeline failed in trunk

2014-12-15 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-7524:
---

 Summary: TestRetryCacheWithHA.testUpdatePipeline failed in trunk
 Key: HDFS-7524
 URL: https://issues.apache.org/jira/browse/HDFS-7524
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang


https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/

Error Message
{quote}
After waiting the operation updatePipeline still has not taken effect on NN yet
Stacktrace

java.lang.AssertionError: After waiting the operation updatePipeline still has 
not taken effect on NN yet
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176)
{quote}

Found by tool proposed in HADOOP-11045:

{quote}
[yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
Hadoop-Hdfs-trunk -n 5 | tee bt.log
Recently FAILED builds in url: 
https://builds.apache.org//job/Hadoop-Hdfs-trunk
THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as 
listed below:

===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
Failed test: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 
10:32:27)
Failed test: 
org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization

Among 6 runs examined, all failed tests #failedRuns: testName:
3: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
2: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
1: 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect failed in trunk

2014-12-15 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-7525:
---

 Summary: TestDatanodeManager.testNumVersionsReportedCorrect failed 
in trunk
 Key: HDFS-7525
 URL: https://issues.apache.org/jira/browse/HDFS-7525
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang


https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/

{quote}
Error Message

The map of version counts returned by DatanodeManager was not what it was 
expected to be on iteration 484 expected:0 but was:1
Stacktrace

java.lang.AssertionError: The map of version counts returned by DatanodeManager 
was not what it was expected to be on iteration 484 expected:0 but was:1
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
{quote}

Found by tool proposed in HADOOP-11045:

{quote}
[yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
Hadoop-Hdfs-trunk -n 5 | tee bt.log
Recently FAILED builds in url: 
https://builds.apache.org//job/Hadoop-Hdfs-trunk
THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as 
listed below:

===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
Failed test: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 
10:32:27)
Failed test: 
org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization

Among 6 runs examined, all failed tests #failedRuns: testName:
3: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
2: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
1: 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7524) TestRetryCacheWithHA.testUpdatePipeline failed occasionally in trunk

2014-12-15 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7524:

Summary: TestRetryCacheWithHA.testUpdatePipeline failed occasionally in 
trunk  (was: TestRetryCacheWithHA.testUpdatePipeline failed in trunk)

 TestRetryCacheWithHA.testUpdatePipeline failed occasionally in trunk
 

 Key: HDFS-7524
 URL: https://issues.apache.org/jira/browse/HDFS-7524
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 Error Message
 {quote}
 After waiting the operation updatePipeline still has not taken effect on NN 
 yet
 Stacktrace
 java.lang.AssertionError: After waiting the operation updatePipeline still 
 has not taken effect on NN yet
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 Among 6 runs examined, all failed tests #failedRuns: testName:
 3: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 2: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 1: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7526) SetReplication OutOfMemoryError

2014-12-15 Thread Philipp Schuegerl (JIRA)
Philipp Schuegerl created HDFS-7526:
---

 Summary: SetReplication OutOfMemoryError
 Key: HDFS-7526
 URL: https://issues.apache.org/jira/browse/HDFS-7526
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Philipp Schuegerl


Setting the replication of a HDFS folder recursively can run out of memory. 
E.g. with a large /var/log directory:

hdfs dfs -setrep -R -w 1 /var/log

Exception in thread main java.lang.OutOfMemoryError: GC overhead limit 
exceeded
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.init(String.java:203)
at java.lang.String.substring(String.java:1913)
at java.net.URI$Parser.substring(URI.java:2850)
at java.net.URI$Parser.parse(URI.java:3046)
at java.net.URI.init(URI.java:753)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
at org.apache.hadoop.fs.Path.init(Path.java:116)
at org.apache.hadoop.fs.Path.init(Path.java:94)
at 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
at 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
at 
org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
at 
org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
at 
org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk

2014-12-15 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7525:

Summary: TestDatanodeManager.testNumVersionsReportedCorrect fails 
occassionally in trunk  (was: 
TestDatanodeManager.testNumVersionsReportedCorrect failed in trunk)

 TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in 
 trunk
 ---

 Key: HDFS-7525
 URL: https://issues.apache.org/jira/browse/HDFS-7525
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 484 expected:0 but was:1
 Stacktrace
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 484 
 expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 Among 6 runs examined, all failed tests #failedRuns: testName:
 3: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 2: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 1: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7524) TestRetryCacheWithHA.testUpdatePipeline fails occasionally in trunk

2014-12-15 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7524:

Summary: TestRetryCacheWithHA.testUpdatePipeline fails occasionally in 
trunk  (was: TestRetryCacheWithHA.testUpdatePipeline failed occasionally in 
trunk)

 TestRetryCacheWithHA.testUpdatePipeline fails occasionally in trunk
 ---

 Key: HDFS-7524
 URL: https://issues.apache.org/jira/browse/HDFS-7524
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 Error Message
 {quote}
 After waiting the operation updatePipeline still has not taken effect on NN 
 yet
 Stacktrace
 java.lang.AssertionError: After waiting the operation updatePipeline still 
 has not taken effect on NN yet
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 Among 6 runs examined, all failed tests #failedRuns: testName:
 3: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 2: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 1: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

2014-12-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246862#comment-14246862
 ] 

Hari Sekhon commented on HDFS-5442:
---

Zero-loss is practically impossible unless you do synchronous high latency 
writes to both sites, so neither WANdisco or MapR can claim zero-loss while 
still performing well, and I've had significant unmanageable streaming async 
block replication lag greater than several dozen minutes (ie significant 
potential data loss) when using a well known HDFS proprietary add-on...

With atomic snapshot mirroring mode you will at least know what you have 
consistent to what point in time and can work with that, rather than having to 
fsck to find out what data has random holes in it from blocks that haven't made 
it across the random low priority replication.

For option 1 it would be better if block write ordering could be maintained and 
replayed at the other site in the same order for chronological consistency up 
to the latest DR checkpoint in case any non-trivial application sitting on top 
of the filesystem isn't prepared for having holes in it's data.. eg for WAL 
logs or distributed SQL DBs redo logs sitting on top of HDFS (some solutions 
might do their own replication in which case that should be excluded via my 
previously mentioned configurable path exclusions).

The final thing that HDFS DR should have is administrative active foreground 
block repair for off-peak times to catch up faster by maxing out the bandwidth 
(or the max bandwidth settings you've specified).

Ultimately both option 1 and option 2 should be provided since each is better 
for different use cases. Option 2 has been done very well by MapR, Option 1 
hasn't been done perfectly by anyone I've see yet but I'm very eager for this 
to be done (anyone at Hortonworks reading this??? ;)  ).

 Zero loss HDFS data replication for multiple datacenters
 

 Key: HDFS-5442
 URL: https://issues.apache.org/jira/browse/HDFS-5442
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Avik Dey
Assignee: Dian Fu
 Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
 Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf


 Hadoop is architected to operate efficiently at scale for normal hardware 
 failures within a datacenter. Hadoop is not designed today to handle 
 datacenter failures. Although HDFS is not designed for nor deployed in 
 configurations spanning multiple datacenters, replicating data from one 
 location to another is common practice for disaster recovery and global 
 service availability. There are current solutions available for batch 
 replication using data copy/export tools. However, while providing some 
 backup capability for HDFS data, they do not provide the capability to 
 recover all your HDFS data from a datacenter failure and be up and running 
 again with a fully operational Hadoop cluster in another datacenter in a 
 matter of minutes. For disaster recovery from a datacenter failure, we should 
 provide a fully distributed, zero data loss, low latency, high throughput and 
 secure HDFS data replication solution for multiple datacenter setup.
 Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7526) SetReplication OutOfMemoryError

2014-12-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246864#comment-14246864
 ] 

Kihwal Lee commented on HDFS-7526:
--

Does the recursively listing the same directory also cause OOM? If it does, it 
is a known issue and until we fix FsShell to use the new remote iterator-based 
API, it will continue to be a problem.

 SetReplication OutOfMemoryError
 ---

 Key: HDFS-7526
 URL: https://issues.apache.org/jira/browse/HDFS-7526
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Philipp Schuegerl

 Setting the replication of a HDFS folder recursively can run out of memory. 
 E.g. with a large /var/log directory:
 hdfs dfs -setrep -R -w 1 /var/log
 Exception in thread main java.lang.OutOfMemoryError: GC overhead limit 
 exceeded
   at java.util.Arrays.copyOfRange(Arrays.java:2694)
   at java.lang.String.init(String.java:203)
   at java.lang.String.substring(String.java:1913)
   at java.net.URI$Parser.substring(URI.java:2850)
   at java.net.URI$Parser.parse(URI.java:3046)
   at java.net.URI.init(URI.java:753)
   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
   at org.apache.hadoop.fs.Path.init(Path.java:116)
   at org.apache.hadoop.fs.Path.init(Path.java:94)
   at 
 org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
   at 
 org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
   at 
 org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
   at 
 org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-15 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-7527:
---

 Summary: TestDecommission.testIncludeByRegistrationName fails 
occassionally in trunk
 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang


https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/

{quote}
Error Message

test timed out after 36 milliseconds
Stacktrace

java.lang.Exception: test timed out after 36 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)


2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) 
- Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 
(Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied 
communication with namenode because the host is not in the include-list: 
DatanodeRegistration(127.0.0.1, 
datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
infoSecurePort=0, ipcPort=43726, 
storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)


2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) 
- Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 
(Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. 
java.io.IOException: DN shut down before block pool connected
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:745)
{quote}

Found by tool proposed in HADOOP-11045:

{quote}
[yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
Hadoop-Hdfs-trunk -n 5 | tee bt.log
Recently FAILED builds in url: 
https://builds.apache.org//job/Hadoop-Hdfs-trunk
THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as 
listed below:

===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
Failed test: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 
10:32:27)
Failed test: 
org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 
03:30:01)
Failed test: 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization

Among 6 runs examined, all failed tests #failedRuns: testName:
3: 
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
2: 

[jira] [Updated] (HDFS-7513) HDFS inotify: add defaultBlockSize to CreateEvent

2014-12-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7513:
---
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

 HDFS inotify: add defaultBlockSize to CreateEvent
 -

 Key: HDFS-7513
 URL: https://issues.apache.org/jira/browse/HDFS-7513
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7513.001.patch, HDFS-7513.002.patch, 
 HDFS-7513.003.patch


 HDFS inotify: add defaultBlockSize to CreateEvent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7506) Consolidate implementation of setting inode attributes into a single class

2014-12-15 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246975#comment-14246975
 ] 

Jing Zhao commented on HDFS-7506:
-

+1

 Consolidate implementation of setting inode attributes into a single class
 --

 Key: HDFS-7506
 URL: https://issues.apache.org/jira/browse/HDFS-7506
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7506.000.patch, HDFS-7506.001.patch, 
 HDFS-7506.001.patch, HDFS-7506.002.patch, HDFS-7506.003.patch


 This jira proposes to consolidate the implementation of setting inode 
 attributes (i.e., times, permissions, owner, etc.) to a single class for 
 better maintainability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2256) we should add a wait for non-safe mode and call dfsadmin -report in start-dfs

2014-12-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-2256.

Resolution: Won't Fix

Closing this as won't fix.

 we should add a wait for non-safe mode and call dfsadmin -report in start-dfs
 -

 Key: HDFS-2256
 URL: https://issues.apache.org/jira/browse/HDFS-2256
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 I think we should add a call to wait for safe mode exit and print the dfs 
 report to show upgrades that are in progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hdfs-nfs project

2014-12-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7516:
-
Attachment: HDFS-7516.002.patch

Uploaded a new patch to address Haohui's comment.

 Fix findbugs warnings in hdfs-nfs project
 -

 Key: HDFS-7516
 URL: https://issues.apache.org/jira/browse/HDFS-7516
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-6239) start-dfs.sh does not start remote DataNode due to escape characters

2014-12-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-6239.

Resolution: Won't Fix

Hadoop 1.x is dead and trunk/3.x has completely different code for this now. 
Closing as won't fix.

 start-dfs.sh does not start remote DataNode due to escape characters
 

 Key: HDFS-6239
 URL: https://issues.apache.org/jira/browse/HDFS-6239
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 1.2.1
 Environment: GNU bash, version 4.1.2(1)-release 
 (x86_64-redhat-linux-gnu)
 Linux foo 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 
 x86_64 x86_64 x86_64 GNU/Linux
 AFS file system.
Reporter: xyzzy

 start-dfs.sh fails to start remote data nodes and task nodes, though it is 
 possible to start them manually through hadoop-daemon.sh.
 I've been able to debug and find the root cause the bug, and I thought it was 
 a trivial fix, but I do not know how to do it. Can't figure out a way to 
 handle this seemingly trivial bug.
 hadoop-daemons.sh calls slave.sh:
 exec $bin/slaves.sh --config $HADOOP_CONF_DIR cd $HADOOP_HOME \; 
 $bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR $@
 This is the issue when I debug using bash -x: In slaves.sh, the \; becomes ';'
 + ssh .xx..xxx cd /afs/xx..xxx/x/x/x/xx/x/libexec/.. ';' 
 /afs/xx..xxx/x/x/x/xx//bin/hadoop-daemon.sh --config 
 /afs/xx..xxx/x/x/x/xx//libexec/../conf start datanode
 The problem is ';' . Because the semi-colon is surrounded by quotes, it 
 doesn't execute the code after that. I manually ran the above command, and as 
 expected the data node did not start. When I removed the quotes around the 
 semi-colon, everything works. Please note that you can see the issue only 
 when you do bash -x. If you echo the statement, the quotes around the 
 semi-colon are not visible.
 This issue is always reproducible for me, and because of it, I have to 
 manually start daemons on each machine. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-2628) Remove Mapred filenames from HDFS findbugsExcludeFile.xml file

2014-12-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-2628:
---
Component/s: (was: scripts)
 test

 Remove Mapred filenames from HDFS findbugsExcludeFile.xml file
 --

 Key: HDFS-2628
 URL: https://issues.apache.org/jira/browse/HDFS-2628
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Reporter: Uma Maheswara Rao G
Priority: Minor

 Mapreduce filesnames are there in 
 hadoop-hdfs-project\hadoop-hdfs\dev-support\findbugsExcludeFile.xml
 is it intentional? i think we should remove them from HDFS.
 Exampl:
 {code}
   !--
Ignore warnings where child class has the same name as
super class. Classes based on Old API shadow names from
new API. Should go off after HADOOP-1.0
  --
  Match
Class name=~org.apache.hadoop.mapred.* /
Bug pattern=NM_SAME_SIMPLE_NAME_AS_SUPERCLASS /
  /Match
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7513) HDFS inotify: add defaultBlockSize to CreateEvent

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247000#comment-14247000
 ] 

Hudson commented on HDFS-7513:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6718 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6718/])
HDFS-7513. HDFS inotify: add defaultBlockSize to CreateEvent (cmccabe) 
(cmccabe: rev 6e13fc62e1f284f22fd0089f06ce281198bc7c2a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/inotify/Event.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/InotifyFSEditLogOpTranslator.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/inotify.proto


 HDFS inotify: add defaultBlockSize to CreateEvent
 -

 Key: HDFS-7513
 URL: https://issues.apache.org/jira/browse/HDFS-7513
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7513.001.patch, HDFS-7513.002.patch, 
 HDFS-7513.003.patch


 HDFS inotify: add defaultBlockSize to CreateEvent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4063) Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.

2014-12-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-4063.

Resolution: Won't Fix

Closing this as won't fix.  Hadoop 1.x is dead and this code has been removed 
from modern versions of Hadoop.

 Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.
 

 Key: HDFS-4063
 URL: https://issues.apache.org/jira/browse/HDFS-4063
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts, tools
Affects Versions: 1.0.3, 1.1.0, 2.0.2-alpha
 Environment: Fedora 17 3.3.4-5.fc17.x86_64t, java version 
 1.7.0_06-icedtea, Rackspace Cloud (NextGen)
Reporter: Haoquan Wang
Priority: Minor
  Labels: patch
   Original Estimate: 1h
  Remaining Estimate: 1h

 The JAVA_HOME directory remains unchanged no matter what you enter when you 
 run hadoop-setup-conf.sh to generate hadoop configurations. Please see below 
 example:
 *
 [root@hadoop-slave ~]# /sbin/hadoop-setup-conf.sh
 Setup Hadoop Configuration
 Where would you like to put config directory? (/etc/hadoop)
 Where would you like to put log directory? (/var/log/hadoop)
 Where would you like to put pid directory? (/var/run/hadoop)
 What is the host of the namenode? (hadoop-slave)
 Where would you like to put namenode data directory? 
 (/var/lib/hadoop/hdfs/namenode)
 Where would you like to put datanode data directory? 
 (/var/lib/hadoop/hdfs/datanode)
 What is the host of the jobtracker? (hadoop-slave)
 Where would you like to put jobtracker/tasktracker data directory? 
 (/var/lib/hadoop/mapred)
 Where is JAVA_HOME directory? (/usr/java/default) *+/usr/lib/jvm/jre+*
 Would you like to create directories/copy conf files to localhost? (Y/n)
 Review your choices:
 Config directory: /etc/hadoop
 Log directory   : /var/log/hadoop
 PID directory   : /var/run/hadoop
 Namenode host   : hadoop-slave
 Namenode directory  : /var/lib/hadoop/hdfs/namenode
 Datanode directory  : /var/lib/hadoop/hdfs/datanode
 Jobtracker host : hadoop-slave
 Mapreduce directory : /var/lib/hadoop/mapred
 Task scheduler  : org.apache.hadoop.mapred.JobQueueTaskScheduler
 JAVA_HOME directory : *+/usr/java/default+*
 Create dirs/copy conf files : y
 Proceed with generate configuration? (y/N) n
 User aborted setup, exiting...
 *
 Resolution:
 Amend line 509 in file /sbin/hadoop-setup-conf.sh
 from:
 JAVA_HOME=${USER_USER_JAVA_HOME:-$JAVA_HOME}
 to:
 JAVA_HOME=${USER_JAVA_HOME:-$JAVA_HOME}
 will resolve this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7506) Consolidate implementation of setting inode attributes into a single class

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7506:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks for the review.

 Consolidate implementation of setting inode attributes into a single class
 --

 Key: HDFS-7506
 URL: https://issues.apache.org/jira/browse/HDFS-7506
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7506.000.patch, HDFS-7506.001.patch, 
 HDFS-7506.001.patch, HDFS-7506.002.patch, HDFS-7506.003.patch


 This jira proposes to consolidate the implementation of setting inode 
 attributes (i.e., times, permissions, owner, etc.) to a single class for 
 better maintainability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7506) Consolidate implementation of setting inode attributes into a single class

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247023#comment-14247023
 ] 

Hudson commented on HDFS-7506:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6719 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6719/])
HDFS-7506. Consolidate implementation of setting inode attributes into a single 
class. Contributed by Haohui Mai. (wheat9: rev 
832ebd8cb63d91b4aa4bfed412b9799b3b9be4a7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirStatAndListingOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAttrOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java


 Consolidate implementation of setting inode attributes into a single class
 --

 Key: HDFS-7506
 URL: https://issues.apache.org/jira/browse/HDFS-7506
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7506.000.patch, HDFS-7506.001.patch, 
 HDFS-7506.001.patch, HDFS-7506.002.patch, HDFS-7506.003.patch


 This jira proposes to consolidate the implementation of setting inode 
 attributes (i.e., times, permissions, owner, etc.) to a single class for 
 better maintainability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7516) Fix findbugs warnings in hdfs-nfs project

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247070#comment-14247070
 ] 

Hadoop QA commented on HDFS-7516:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687278/HDFS-7516.002.patch
  against trunk revision 6e13fc6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9038//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9038//console

This message is automatically generated.

 Fix findbugs warnings in hdfs-nfs project
 -

 Key: HDFS-7516
 URL: https://issues.apache.org/jira/browse/HDFS-7516
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project

2014-12-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7516:
-
Summary: Fix findbugs warnings in hadoop-nfs project  (was: Fix findbugs 
warnings in hdfs-nfs project)

 Fix findbugs warnings in hadoop-nfs project
 ---

 Key: HDFS-7516
 URL: https://issues.apache.org/jira/browse/HDFS-7516
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project

2014-12-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7516:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Fix findbugs warnings in hadoop-nfs project
 ---

 Key: HDFS-7516
 URL: https://issues.apache.org/jira/browse/HDFS-7516
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247094#comment-14247094
 ] 

Hudson commented on HDFS-7516:
--

ABORTED: Integrated in Hadoop-trunk-Commit #6721 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6721/])
HDFS-7516. Fix findbugs warnings in hdfs-nfs project. Contributed by Brandon Li 
(brandonli: rev 42d8858c5d237c4d9ab439c570a17b7fcaf781c2)
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/CredentialsSys.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/SYMLINK3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/LOOKUP3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/RMDIR3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/LINK3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountResponse.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/CREATE3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/MKDIR3Request.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/REMOVE3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/RENAME3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/MKNOD3Request.java
* 
hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/FileHandle.java


 Fix findbugs warnings in hadoop-nfs project
 ---

 Key: HDFS-7516
 URL: https://issues.apache.org/jira/browse/HDFS-7516
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.7.0

 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project

2014-12-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7516:
-
Fix Version/s: 2.7.0

 Fix findbugs warnings in hadoop-nfs project
 ---

 Key: HDFS-7516
 URL: https://issues.apache.org/jira/browse/HDFS-7516
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.7.0

 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project

2014-12-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247097#comment-14247097
 ] 

Brandon Li commented on HDFS-7516:
--

Thank you, [~wheat9], for the review. I've committed the patch.

 Fix findbugs warnings in hadoop-nfs project
 ---

 Key: HDFS-7516
 URL: https://issues.apache.org/jira/browse/HDFS-7516
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.7.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.7.0

 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3

2014-12-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7023:
---
Fix Version/s: HDFS-6994
 Target Version/s: HDFS-6994
Affects Version/s: HDFS-6994
   Status: Patch Available  (was: In Progress)

Committed to branch, thanks for the review.

 use libexpat instead of libxml2 for libhdfs3
 

 Key: HDFS-7023
 URL: https://issues.apache.org/jira/browse/HDFS-7023
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: HDFS-6994
Reporter: Zhanwei Wang
Assignee: Colin Patrick McCabe
 Fix For: HDFS-6994

 Attachments: HDFS-7023-pnative.002.patch, HDFS-7023.001.pnative.patch


 As commented in HDFS-6994, libxml2 may has some thread safe issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247217#comment-14247217
 ] 

Hadoop QA commented on HDFS-7023:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683901/HDFS-7023-pnative.002.patch
  against trunk revision e597249.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9039//console

This message is automatically generated.

 use libexpat instead of libxml2 for libhdfs3
 

 Key: HDFS-7023
 URL: https://issues.apache.org/jira/browse/HDFS-7023
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: HDFS-6994
Reporter: Zhanwei Wang
Assignee: Colin Patrick McCabe
 Fix For: HDFS-6994

 Attachments: HDFS-7023-pnative.002.patch, HDFS-7023.001.pnative.patch


 As commented in HDFS-6994, libxml2 may has some thread safe issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-15 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7411:
--
Attachment: hdfs-7411.005.patch

Thanks again Arpit for commenting. The TestOpenFilesWithSnapshot failure was 
caused by the block report change, some digging shows the NN won't come out of 
startup safemode. I'm going to defer this fix until later, it's unrelated to 
the decom manager refactor itself.

New patch attached without the block report change, but plus test improvements.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2014-12-15 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247283#comment-14247283
 ] 

Lei (Eddy) Xu commented on HDFS-6440:
-

Hey, [~jesse_yates] Thanks for your answers!

I have a few further questions regarding the patch:

1. I did not see where {{isPrimearyCheckPointer}} is set to {{false}}. 

{code:title=StandbyCheckpointer.java}
private boolean isPrimaryCheckPointer = true;
...
if (upload.get() == TransferFsImage.TransferResult.SUCCESS) {
   this.isPrimaryCheckPointer = true;
   //avoid getting the rest of the results - we don't care since we had a 
successful upload
   break;
}
{code}

I guess the default value of {{isPrimaryCheckPointer}} might be a typo, which 
should be {{false}}. Moreover, is there a case thatSNN switches from 
primary check pointer to non-primary check pointer?


2. Is the following condition correct? I think only {{sendRequest}} is needed. 

{code:title=StandbyCheckpointer.java}
if (needCheckpoint  sendRequest) {
{code}
 
Also in the old code,

{code}
} else if (secsSinceLast = checkpointConf.getPeriod()) {
LOG.info(Triggering checkpoint because it has been  +
secsSinceLast +  seconds since the last checkpoint, which  +
exceeds the configured interval  + 
checkpointConf.getPeriod());
needCheckpoint = true;
  }
{code}

Does it implies that if {{secsSinceLast = checkpointConf.getPeriod()}} is 
{{true}} then {{secsSinceLast = checkpointConf.getQuietPeriod()}} is always 
{{true}}, for default {{quite multiplier}} value? If it is the case, are these 
duplicated conditions?

It looks like that it might be easier to let ANN calculate the above 
conditions, as it has the actual system-wide knowledge of last upload and last 
txnid. It could be a nice optimization later.

3. When it uploads fsimage, are {{SC_CONFLICT}} and {{SC_EXPECTATION_FAILED}} 
not handled in the SNN in the current patch? Do you plan to handle them in a 
following patch?

4. Could you set {{EditLogTailer#maxRetries}} to {{private final}}? Do we need 
to enforce an acceptable value range for {{maxRetries}}? For instance, in the 
following code, it would not try every NN when {{nextNN = nns.size() - 1}} and 
{{maxRetries = 1}}

{code}
// if we have reached the max loop count, quit by returning null
  if (nextNN / nns.size() = maxRetries) {
  return null;
  }
{code}

5. There are a few changes due to format, e.g., in {{doCheckpointing()}}. Could 
you remove them to reduce the size of the patch?

Also the following code is indented incorrectly.

{code}
int i = 0;
  for (; i  uploads.size(); i++) {
FutureTransferFsImage.TransferResult upload = uploads.get(i);
try {
  // TODO should there be some smarts here about retries nodes that are 
not the active NN?
  if (upload.get() == TransferFsImage.TransferResult.SUCCESS) {
this.isPrimaryCheckPointer = true;
//avoid getting the rest of the results - we don't care since we 
had a successful upload
break;
  }
} catch (ExecutionException e) {
  ioe = new IOException(Exception during image upload:  + 
e.getMessage(),
  e.getCause());
  break;
} catch (InterruptedException e) {
  ie = null;
  break;
}
  }
{code}

Other parts LGTM. Thanks again for working on this, [~jesse_yates]!

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7503) Namenode restart after large deletions can cause slow processReport (due to logging)

2014-12-15 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247285#comment-14247285
 ] 

Arpit Agarwal commented on HDFS-7503:
-

Hi Suresh, that may cause the output from multiple threads to get interleaved 
since we're not synchronized any more and make it difficult to parse.

 Namenode restart after large deletions can cause slow processReport (due to 
 logging)
 

 Key: HDFS-7503
 URL: https://issues.apache.org/jira/browse/HDFS-7503
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.1, 2.6.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 1.3.0, 2.6.1

 Attachments: HDFS-7503.branch-1.02.patch, HDFS-7503.branch-1.patch, 
 HDFS-7503.trunk.01.patch, HDFS-7503.trunk.02.patch


 If a large directory is deleted and namenode is immediately restarted, there 
 are a lot of blocks that do not belong to any file. This results in a log:
 {code}
 2014-11-08 03:11:45,584 INFO BlockStateChange 
 (BlockManager.java:processReport(1901)) - BLOCK* processReport: 
 blk_1074250282_509532 on 172.31.44.17:1019 size 6 does not belong to any file.
 {code}
 This log is printed within FSNamsystem lock. This can cause namenode to take 
 long time in coming out of safemode.
 One solution is to downgrade the logging level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7484:
-
Attachment: HDFS-7484.002.patch

 Simplify the workflow of calculating permission in mkdirs()
 ---

 Key: HDFS-7484
 URL: https://issues.apache.org/jira/browse/HDFS-7484
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, 
 HDFS-7484.002.patch


 {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions 
 based on whether {{inheritPermission}} is true. This jira proposes to 
 simplify the workflow and make it explicit for the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247354#comment-14247354
 ] 

Hadoop QA commented on HDFS-7484:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687316/HDFS-7484.002.patch
  against trunk revision a095622.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9041//console

This message is automatically generated.

 Simplify the workflow of calculating permission in mkdirs()
 ---

 Key: HDFS-7484
 URL: https://issues.apache.org/jira/browse/HDFS-7484
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, 
 HDFS-7484.002.patch


 {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions 
 based on whether {{inheritPermission}} is true. This jira proposes to 
 simplify the workflow and make it explicit for the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7484:
-
Attachment: HDFS-7484.003.patch

 Simplify the workflow of calculating permission in mkdirs()
 ---

 Key: HDFS-7484
 URL: https://issues.apache.org/jira/browse/HDFS-7484
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, 
 HDFS-7484.002.patch, HDFS-7484.003.patch


 {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions 
 based on whether {{inheritPermission}} is true. This jira proposes to 
 simplify the workflow and make it explicit for the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-12-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247407#comment-14247407
 ] 

Daryn Sharp commented on HDFS-7435:
---

Apologies for delay, I've been dealing with production issues, including block 
reports which are becoming a very big issue.

I've studied the way PB is implemented and I'm not sure fragmenting the buffer 
will add value.  Proper encoding (mine is buggy) will not allocate a full 
buffer but uses a tree to hold fragments of the byte string.  Decoding the PB 
will use the full byte array of the PB as the backing store for slices (ref to 
full array + offset/length).  We've already paid the price for a large 
allocation of the full PB, carried it around in the Call object, etc, so 
extracting the field is essentially free.  Whether it's one or more is 
irrelevant.

I'm trying to performance test a patch that internally segments the 
BlockListAsLongs and correctly outputs the byte buffer.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-15 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-7528:


 Summary: Consolidate symlink-related implementation into a single 
class
 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


The jira proposes to consolidate symlink-related implementation into a single 
class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7528:
-
Status: Patch Available  (was: Open)

 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7528:
-
Attachment: HDFS-7528.000.patch

 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7528:
-
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-7416

 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking

2014-12-15 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247475#comment-14247475
 ] 

Arpit Agarwal commented on HDFS-6919:
-

Colin, do you plan to address this for 2.7?

 Enforce a single limit for RAM disk usage and replicas cached via locking
 -

 Key: HDFS-6919
 URL: https://issues.apache.org/jira/browse/HDFS-6919
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Arpit Agarwal
Assignee: Colin Patrick McCabe
Priority: Blocker

 The DataNode can have a single limit for memory usage which applies to both 
 replicas cached via CCM and replicas on RAM disk.
 See comments 
 [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025],
  
 [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245]
  and 
 [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575]
  for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6425:
--
Attachment: HDFS-6425-3.patch

Thanks, Kihwal.

Here is the updated patch for trunk based on a slightly different version. In 
rescanPostponedMisreplicatedBlocks, instead of always picking the first 
blocksPerRescan blocks, the new version randomly selects blocksPerRescan 
consecutive blocks. This is to handle the case if for some reason some 
datanodes remain in content stale state for a long time and only impact the 
first blocksPerRescan blocks.

This new version has been running on our production clusters for couple months.

Regarding the root cause of over replication. We did some analysis a while 
back. It could be due to the IBR scenario you mentioned. There are also other 
sources.

1. Load balancer could create spike of over replication in our clusters.
2. As part of machine repair process, we used to bring unformatted machines 
back the cluster.
3. It appears right after NN startup and leave safe mode but before all DNs 
send blockreport, NN will consider some blocks under replicated and start 
replication process. Later after the remaining DNs send blockreport, NN will 
get into over replicated situation.

 Large postponedMisreplicatedBlocks has impact on blockReport latency
 

 Key: HDFS-6425
 URL: https://issues.apache.org/jira/browse/HDFS-6425
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
 HDFS-6425-Test-Case.pdf, HDFS-6425.patch


 Sometimes we have large number of over replicates when NN fails over. When 
 the new active NN took over, over replicated blocks will be put to 
 postponedMisreplicatedBlocks until all DNs for that block aren't stale 
 anymore.
 We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
 became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
 just kept increasing until the cluster is stable. 
 In addition, large postponedMisreplicatedBlocks could make 
 rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
 takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247540#comment-14247540
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687303/hdfs-7411.005.patch
  against trunk revision e597249.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9040//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9040//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-15 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247558#comment-14247558
 ] 

Andrew Wang commented on HDFS-7411:
---

This is a first, Jenkins says everything is +1, but gives a -1 overall. 
Buildbot must have it out for me :)

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-15 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247563#comment-14247563
 ] 

Ravi Prakash commented on HDFS-7411:


Lolz! I'm wondering if the jenkins jobs are running in isolated workspaces at 
all. It'd explain a lot

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()

2014-12-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-7484:
---

Assignee: Jing Zhao  (was: Haohui Mai)

 Simplify the workflow of calculating permission in mkdirs()
 ---

 Key: HDFS-7484
 URL: https://issues.apache.org/jira/browse/HDFS-7484
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Jing Zhao
 Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, 
 HDFS-7484.002.patch, HDFS-7484.003.patch


 {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions 
 based on whether {{inheritPermission}} is true. This jira proposes to 
 simplify the workflow and make it explicit for the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()

2014-12-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7484:

Attachment: HDFS-7484.004.patch

Continue the work of Haohui. Based on the 003 patch, add a new method 
{{INodesInPath#getExistingINodes}} to return all the existing INodes of a given 
INodesInPath instance (i.e., trimming all the null elements), and use this 
method to simplify the mkdir logic.

 Simplify the workflow of calculating permission in mkdirs()
 ---

 Key: HDFS-7484
 URL: https://issues.apache.org/jira/browse/HDFS-7484
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Jing Zhao
 Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, 
 HDFS-7484.002.patch, HDFS-7484.003.patch, HDFS-7484.004.patch


 {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions 
 based on whether {{inheritPermission}} is true. This jira proposes to 
 simplify the workflow and make it explicit for the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-15 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-7529:


 Summary: Consolidate encryption zone related implementation into a 
single class
 Key: HDFS-7529
 URL: https://issues.apache.org/jira/browse/HDFS-7529
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


This jira proposes to consolidate encryption zone related implementation to a 
single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2014-12-15 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247622#comment-14247622
 ] 

Andrew Wang commented on HDFS-6673:
---

Hey Eddy, thanks for working on this. It looks good, I don't see any major 
issues, just some small nits:

* Needs a small rebase
* DelimitedImageViewer class javadoc typos: included - included in, can 
be - can be set
* TextWriterImageViewer javadoc: IOExceptions - IOException
* We need some unit tests. I bet we have some for the old image viewer which 
could be revived.

Since the purpose of this is all about large fsimages, have you tested this 
with a large FSImage, and checked the memory usage / performance? Also curious 
how we tune the LevelDB caches and write buffers, as described here: 
https://code.google.com/p/leveldb/source/browse/include/leveldb/options.h

I think since LevelDB does its own write caching, we could also remove that 
TODO about write batching.

 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7530) Allow renaming an Encryption Zone root

2014-12-15 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7530:
--

 Summary: Allow renaming an Encryption Zone root
 Key: HDFS-7530
 URL: https://issues.apache.org/jira/browse/HDFS-7530
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor


It should be possible to do

hdfs dfs -mv /ezroot /newnameforezroot




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking

2014-12-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247657#comment-14247657
 ] 

Colin Patrick McCabe commented on HDFS-6919:


I'm not sure if I will have time for this in the 2.7 timeframe.  If you'd like 
to take this jira then please do

 Enforce a single limit for RAM disk usage and replicas cached via locking
 -

 Key: HDFS-6919
 URL: https://issues.apache.org/jira/browse/HDFS-6919
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Arpit Agarwal
Assignee: Colin Patrick McCabe
Priority: Blocker

 The DataNode can have a single limit for memory usage which applies to both 
 replicas cached via CCM and replicas on RAM disk.
 See comments 
 [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025],
  
 [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245]
  and 
 [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575]
  for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247671#comment-14247671
 ] 

Hadoop QA commented on HDFS-7484:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687318/HDFS-7484.003.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestReplication
  org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController
  org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace
  
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
  org.apache.hadoop.hdfs.TestSafeMode
  org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache
  org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
  org.apache.hadoop.hdfs.qjournal.TestNNWithQJM
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles
  org.apache.hadoop.hdfs.TestFileAppendRestart
  
org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade
  org.apache.hadoop.hdfs.server.namenode.ha.TestHAFsck
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.qjournal.TestSecureNNWithQJM
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
  org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat
  org.apache.hadoop.hdfs.TestSetTimes
  org.apache.hadoop.hdfs.server.namenode.TestAddBlock
  org.apache.hadoop.hdfs.TestPersistBlocks
  org.apache.hadoop.hdfs.TestEncryptedTransfer
  org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
  org.apache.hadoop.security.TestPermissionSymlinks
  org.apache.hadoop.hdfs.TestDFSRollback
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots
  org.apache.hadoop.hdfs.TestDFSClientFailover
  org.apache.hadoop.hdfs.TestFileAppend2
  org.apache.hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA
  org.apache.hadoop.hdfs.TestDFSClientRetries
  org.apache.hadoop.hdfs.server.namenode.TestFSImage
  org.apache.hadoop.hdfs.server.namenode.TestCreateEditsLog
  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
  org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
  org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot
  org.apache.hadoop.hdfs.TestRollingUpgradeDowngrade
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
  org.apache.hadoop.hdfs.web.TestWebHDFS
  org.apache.hadoop.hdfs.TestDecommission
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion
  org.apache.hadoop.hdfs.TestBlockStoragePolicy
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots
  org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite
  org.apache.hadoop.hdfs.server.namenode.TestEditLogRace
  org.apache.hadoop.hdfs.TestDFSUpgrade
  

[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247680#comment-14247680
 ] 

Hadoop QA commented on HDFS-7528:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687344/HDFS-7528.000.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1221 javac 
compiler warnings (more than the trunk's current 1 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
49 warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/9043//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9043//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9043//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9043//console

This message is automatically generated.

 Consolidate symlink-related implementation into a single class
 --

 Key: HDFS-7528
 URL: https://issues.apache.org/jira/browse/HDFS-7528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7528.000.patch


 The jira proposes to consolidate symlink-related implementation into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247695#comment-14247695
 ] 

Hadoop QA commented on HDFS-6425:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687355/HDFS-6425-3.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9044//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9044//console

This message is automatically generated.

 Large postponedMisreplicatedBlocks has impact on blockReport latency
 

 Key: HDFS-6425
 URL: https://issues.apache.org/jira/browse/HDFS-6425
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
 HDFS-6425-Test-Case.pdf, HDFS-6425.patch


 Sometimes we have large number of over replicates when NN fails over. When 
 the new active NN took over, over replicated blocks will be put to 
 postponedMisreplicatedBlocks until all DNs for that block aren't stale 
 anymore.
 We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
 became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
 just kept increasing until the cluster is stable. 
 In addition, large postponedMisreplicatedBlocks could make 
 rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
 takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247716#comment-14247716
 ] 

Hadoop QA commented on HDFS-7494:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12686991/hdfs-7494-002.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9045//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9045//console

This message is automatically generated.

 Checking of closed in DFSInputStream#pread() should be protected by 
 synchronization
 ---

 Key: HDFS-7494
 URL: https://issues.apache.org/jira/browse/HDFS-7494
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch


 {code}
   private int pread(long position, byte[] buffer, int offset, int length)
   throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
 {code}
 Checking of closed should be protected by holding lock on 
 DFSInputStream.this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7531) Improve the concurrent access of FsVolumeList

2014-12-15 Thread Lei (Eddy) Xu (JIRA)
Lei (Eddy) Xu created HDFS-7531:
---

 Summary: Improve the concurrent access of FsVolumeList
 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu


{{FsVolumeList}} uses {{synchronized}} to protect the update on 
{{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
{{getAvailable()}}) iterate {{volumes}} without protection.

This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7531) Improve the concurrent access of FsVolumeList

2014-12-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7531:

Attachment: HDFS-7531.000.patch

This patch changes {{FsVolumeList#volumes}} from {{volatile 
ListFsVolumeImpl}} to {{AtomicReferenceFsVolumeImpl[]}}. 

 Improve the concurrent access of FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7531.000.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7531) Improve the concurrent access of FsVolumeList

2014-12-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7531:

Status: Patch Available  (was: Open)

 Improve the concurrent access of FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7531.000.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7531) Improve the concurrent access on FsVolumeList

2014-12-15 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7531:

Summary: Improve the concurrent access on FsVolumeList  (was: Improve the 
concurrent access of FsVolumeList)

 Improve the concurrent access on FsVolumeList
 -

 Key: HDFS-7531
 URL: https://issues.apache.org/jira/browse/HDFS-7531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7531.000.patch


 {{FsVolumeList}} uses {{synchronized}} to protect the update on 
 {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, 
 {{getAvailable()}}) iterate {{volumes}} without protection.
 This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to 
 provide better concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2014-12-15 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247834#comment-14247834
 ] 

Haohui Mai commented on HDFS-6673:
--

I'm unsure why this requires sorting and LevelDB. Are there any particular use 
case behind it? Since the delimited output depends heavily on the internal 
implementation details of the fsimage, it makes more sense to just output 
as-is, going through everything using O(1) space. Even sorting is a required, 
using an external sorting tool like {{sort}} is much more efficient than 
{{LevelDB}}.

On the other hand, I can see quite a bit of value of LevelDB-based output -- if 
you really want to proceed this, maybe we can separate it into another jira?

 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7529:
-
Status: Patch Available  (was: Open)

 Consolidate encryption zone related implementation into a single class
 --

 Key: HDFS-7529
 URL: https://issues.apache.org/jira/browse/HDFS-7529
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7529.000.patch


 This jira proposes to consolidate encryption zone related implementation to a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7529) Consolidate encryption zone related implementation into a single class

2014-12-15 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7529:
-
Attachment: HDFS-7529.000.patch

 Consolidate encryption zone related implementation into a single class
 --

 Key: HDFS-7529
 URL: https://issues.apache.org/jira/browse/HDFS-7529
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7529.000.patch


 This jira proposes to consolidate encryption zone related implementation to a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2014-12-15 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247847#comment-14247847
 ] 

Lei (Eddy) Xu commented on HDFS-6673:
-

[~andrew.wang] and [~wheat9] Thanks for your comments and reviews.

bq. I'm unsure why this requires sorting and LevelDB. 

The reason I'd like to sorting is that we need to build the namespace to get 
the full path of each file, while in the Inode there is only filename.

[~andrew.wang] I will add a few unit tests to it. I have not yet run on large 
fsimages and I am working on it. Your other comments will be addressed in the 
meantime.






 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7523) Setting a socket receive buffer size in DFSClient

2014-12-15 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-7523:

Status: Patch Available  (was: Open)

 Setting a socket receive buffer size in DFSClient
 -

 Key: HDFS-7523
 URL: https://issues.apache.org/jira/browse/HDFS-7523
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-7523-001.txt


 It would be nice if we have a socket receive buffer size while creating 
 socket from client(HBase) view, in old version it should be in 
 DFSInputStream, in trunk it seems should be at:
 {code}
   @Override // RemotePeerFactory
   public Peer newConnectedPeer(InetSocketAddress addr,
   TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId)
   throws IOException {
 Peer peer = null;
 boolean success = false;
 Socket sock = null;
 try {
   sock = socketFactory.createSocket();
   NetUtils.connect(sock, addr,
 getRandomLocalInterfaceAddr(),
 dfsClientConf.socketTimeout);
   peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this,
   blockToken, datanodeId);
   peer.setReadTimeout(dfsClientConf.socketTimeout);
 {code}
 e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
 the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, 
 this value sometimes is small for HBase 64k block reading in a 10G network(at 
 least, more system call)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7523) Setting a socket receive buffer size in DFSClient

2014-12-15 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-7523:

Attachment: HDFS-7523-001.txt

Attached a small patch. similiar with the writing stuff in DFSOutputStream:
{code}
   * Create a socket for a write pipeline
   * @param first the first datanode 
   * @param length the pipeline length
   * @param client client
   * @return the socket connected to the first datanode
   */
  static Socket createSocketForPipeline(final DatanodeInfo first,
  final int length, final DFSClient client) throws IOException {
final String dnAddr = first.getXferAddr(
client.getConf().connectToDnViaHostname);
if (DFSClient.LOG.isDebugEnabled()) {
  DFSClient.LOG.debug(Connecting to datanode  + dnAddr);
}
final InetSocketAddress isa = NetUtils.createSocketAddr(dnAddr);
final Socket sock = client.socketFactory.createSocket();
final int timeout = client.getDatanodeReadTimeout(length);
NetUtils.connect(sock, isa, client.getRandomLocalInterfaceAddr(), 
client.getConf().socketTimeout);
sock.setSoTimeout(timeout);
sock.setSendBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
{code}

 Setting a socket receive buffer size in DFSClient
 -

 Key: HDFS-7523
 URL: https://issues.apache.org/jira/browse/HDFS-7523
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-7523-001.txt


 It would be nice if we have a socket receive buffer size while creating 
 socket from client(HBase) view, in old version it should be in 
 DFSInputStream, in trunk it seems should be at:
 {code}
   @Override // RemotePeerFactory
   public Peer newConnectedPeer(InetSocketAddress addr,
   TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId)
   throws IOException {
 Peer peer = null;
 boolean success = false;
 Socket sock = null;
 try {
   sock = socketFactory.createSocket();
   NetUtils.connect(sock, addr,
 getRandomLocalInterfaceAddr(),
 dfsClientConf.socketTimeout);
   peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this,
   blockToken, datanodeId);
   peer.setReadTimeout(dfsClientConf.socketTimeout);
 {code}
 e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
 the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, 
 this value sometimes is small for HBase 64k block reading in a 10G network(at 
 least, more system call)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7532) dncp_block_verification.log.prev too large

2014-12-15 Thread Arti Wadhwani (JIRA)
Arti Wadhwani created HDFS-7532:
---

 Summary: dncp_block_verification.log.prev too large
 Key: HDFS-7532
 URL: https://issues.apache.org/jira/browse/HDFS-7532
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Arti Wadhwani
Priority: Minor


Hi, 

Using hadoop version :  Hadoop 2.0.0-cdh4.7.0

can see on one datanode, dncp_block_verification.log.prev is too  large. 

Is it safe to delete this file? 

{noformat}
-rw-r--r-- 1 hdfs hdfs 1166438426181 Oct 31 09:34 
dncp_block_verification.log.prev
-rw-r--r-- 1 hdfs hdfs 138576163 Dec 15 22:16 
dncp_block_verification.log.curr
{noformat}

This is similar to HDFS-6114 but that is for dncp_block_verification.log.curr 
file. 



Thanks,
Arti Wadhwani



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails

2014-12-15 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned HDFS-7471:
---

Assignee: Binglin Chang

 TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
 -

 Key: HDFS-7471
 URL: https://issues.apache.org/jira/browse/HDFS-7471
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Ted Yu
Assignee: Binglin Chang

 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ :
 {code}
 FAILED:  
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Error Message:
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 237 expected:0 but was:1
 Stack Trace:
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 237 
 expected:0 but was:1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk

2014-12-15 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang resolved HDFS-7525.
-
Resolution: Duplicate

 TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in 
 trunk
 ---

 Key: HDFS-7525
 URL: https://issues.apache.org/jira/browse/HDFS-7525
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 484 expected:0 but was:1
 Stacktrace
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 484 
 expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 Among 6 runs examined, all failed tests #failedRuns: testName:
 3: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 2: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 1: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)