[jira] [Created] (HDFS-7523) Setting a socket receive buffer size in DFSClient
Liang Xie created HDFS-7523: --- Summary: Setting a socket receive buffer size in DFSClient Key: HDFS-7523 URL: https://issues.apache.org/jira/browse/HDFS-7523 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Liang Xie Assignee: Liang Xie It would be nice if we have a socket receive buffer size while creating socket from client(HBase) view, in old version it should be in DFSInputStream, in trunk it seems should be at: {code} @Override // RemotePeerFactory public Peer newConnectedPeer(InetSocketAddress addr, TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId) throws IOException { Peer peer = null; boolean success = false; Socket sock = null; try { sock = socketFactory.createSocket(); NetUtils.connect(sock, addr, getRandomLocalInterfaceAddr(), dfsClientConf.socketTimeout); peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this, blockToken, datanodeId); peer.setReadTimeout(dfsClientConf.socketTimeout); {code} e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, this value sometimes is small for HBase 64k block reading in a 10G network(at least, more system call) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed
[ https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246436#comment-14246436 ] Rakesh R commented on HDFS-7414: Yeah [~brahmareddy], edit log viewer shows the same. I'm suspecting the could be chances of the following: Namenode got shutdown and can't recover where edit update might be missed - Key: HDFS-7414 URL: https://issues.apache.org/jira/browse/HDFS-7414 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1, 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Blocker Scenario: Was running mapreduce job. CPU usage crossed 190% for Datanode and machine became slow.. and seen the following exception .. *Did not get the exact root cause, but as cpu usage more edit log updation might be missed...Need dig to more...anyone have any thoughts.* {noformat} 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation CloseOp [length=0, inodeId=0, path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025, replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=162982] | org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459) 2014-11-20 05:01:18,654 | WARN | main | Encountered exception loading fsimage | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at
[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed
[ https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246440#comment-14246440 ] Rakesh R commented on HDFS-7414: Yeah Brahma, edit log viewer shows the same. I'm suspecting the chances of occurring the below operations concurrently. Let me try to reproduce the same. operation-1) internal lease releases occurred and initialize block recovery. This will add the OP_CLOSE entry. operation-2) client deleted the file. This will add OP_DELETE entry Namenode got shutdown and can't recover where edit update might be missed - Key: HDFS-7414 URL: https://issues.apache.org/jira/browse/HDFS-7414 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1, 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Blocker Scenario: Was running mapreduce job. CPU usage crossed 190% for Datanode and machine became slow.. and seen the following exception .. *Did not get the exact root cause, but as cpu usage more edit log updation might be missed...Need dig to more...anyone have any thoughts.* {noformat} 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation CloseOp [length=0, inodeId=0, path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025, replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=162982] | org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459) 2014-11-20 05:01:18,654 | WARN | main | Encountered exception loading fsimage | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519)
[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed
[ https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246451#comment-14246451 ] Vinayakumar B commented on HDFS-7414: - Looks like you hit the HDFS-6825 which was also due to extra OP_CLOSE edits. Check whether you got the stacktrace mentioned in HDFS-6825 to confirm the same. Namenode got shutdown and can't recover where edit update might be missed - Key: HDFS-7414 URL: https://issues.apache.org/jira/browse/HDFS-7414 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1, 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Blocker Scenario: Was running mapreduce job. CPU usage crossed 190% for Datanode and machine became slow.. and seen the following exception .. *Did not get the exact root cause, but as cpu usage more edit log updation might be missed...Need dig to more...anyone have any thoughts.* {noformat} 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation CloseOp [length=0, inodeId=0, path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025, replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=162982] | org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459) 2014-11-20 05:01:18,654 | WARN | main | Encountered exception loading fsimage | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at
[jira] [Commented] (HDFS-7414) Namenode got shutdown and can't recover where edit update might be missed
[ https://issues.apache.org/jira/browse/HDFS-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246471#comment-14246471 ] Rakesh R commented on HDFS-7414: Hi Vinay, By seeing the [HDFS-6825 comment |https://issues.apache.org/jira/browse/HDFS-6825?focusedCommentId=14098682page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14098682], it looks like similar case. Namenode got shutdown and can't recover where edit update might be missed - Key: HDFS-7414 URL: https://issues.apache.org/jira/browse/HDFS-7414 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1, 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Blocker Scenario: Was running mapreduce job. CPU usage crossed 190% for Datanode and machine became slow.. and seen the following exception .. *Did not get the exact root cause, but as cpu usage more edit log updation might be missed...Need dig to more...anyone have any thoughts.* {noformat} 2014-11-20 05:01:18,430 | ERROR | main | Encountered exception on operation CloseOp [length=0, inodeId=0, path=/outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025, replication=2, mtime=1416409309023, atime=1416409290816, blockSize=67108864, blocks=[blk_1073766144_25321, blk_1073766154_25331, blk_1073766160_25337], permissions=mapred:supergroup:rw-r--r--, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=162982] | org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:724) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1459) 2014-11-20 05:01:18,654 | WARN | main | Encountered exception loading fsimage | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:642) java.io.FileNotFoundException: File does not exist: /outDir2/_temporary/1/_temporary/attempt_1416390004064_0002_m_25_1/part-m-00025 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:409) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:893) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:640) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:519) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:575) at
[jira] [Updated] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-7471: - Component/s: test Priority: Major (was: Minor) Affects Version/s: 3.0.0 uprating to major as this is currently the sole test blocking jenkins HDFS runs TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7480) Namenodes loops on 'block does not belong to any file' after deleting many files
[ https://issues.apache.org/jira/browse/HDFS-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246605#comment-14246605 ] Frode Halvorsen commented on HDFS-7480: --- I will test when 2.6.1 is released.. Namenodes loops on 'block does not belong to any file' after deleting many files Key: HDFS-7480 URL: https://issues.apache.org/jira/browse/HDFS-7480 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Environment: CentOS - HDFS-HA (journal), zookeeper Reporter: Frode Halvorsen A small cluster has 8 servers with 32 G RAM. Two is namenodes (HA-configured), six is Datanodes (8x3 TB disks configured with RAID as one 21 TB drive). The cluster recieves avg 400.000 small files each day. I started archiving (HAR) each day as separate archives. After deleting the orinigal files for one month, the namenodes stared acting up really bad. When restaring those, both active and passive nodes seems to work OK for some time, but then starts to report a lot of blocks belonging to no files, and the name-node just spins those messages in a massive loop. If the passive node is first, it also influences the active node in susch a way that it's no longer possible to archive new files. If the active node also starts in this loop, it suddenly dies without any error-message. The only way I'm able to get rid of the problem, is to start decommission nodes, watching the cluster closely to avoid downtime, and make sure every datanode gets a 'clean' start. After all datanodes has been decommisioned (in turns), and restarted with clean disks, the problem is gone. But if I then delete a lot of files in a short time, the problem starts again... The main problem (I think), is that the recieving and reporting of those blocks takes so many resources, that the namenodes is too busy to tell the datanodes to delete those blocks.. If the active name-node starts on the loop, it does the 'right' thing by telling the datanode to invalidate the block, But the amount of blocks is so massive, that the namenode doesn't do anything else. Just now, I have about 1200-1400 log-entries pr second in the passive node. update : Just got the active namenode in the loop - it logs 1000 lines pr second. 500 'BlockStateChange: BLOCK* processReport: blk_1080796332_7056241 on x.x.x.x:50010 size 1742 does not belong to any file' and 500 ' BlockStateChange: BLOCK* InvalidateBlocks: add blk_1080796332_7056241 to x.x.x.x:50010' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246672#comment-14246672 ] Kihwal Lee commented on HDFS-6425: -- [~mingma] The patch looks good, but does not apply to trunk any more. Can you refresh it? Large postponedMisreplicatedBlocks has impact on blockReport latency Key: HDFS-6425 URL: https://issues.apache.org/jira/browse/HDFS-6425 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, HDFS-6425.patch Sometimes we have large number of over replicates when NN fails over. When the new active NN took over, over replicated blocks will be put to postponedMisreplicatedBlocks until all DNs for that block aren't stale anymore. We have a case where NNs flip flop. Before postponedMisreplicatedBlocks became empty, NN fail over again and again. So postponedMisreplicatedBlocks just kept increasing until the cluster is stable. In addition, large postponedMisreplicatedBlocks could make rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246679#comment-14246679 ] Hadoop QA commented on HDFS-6425: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661072/HDFS-6425-2.patch against trunk revision fae3e86. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9037//console This message is automatically generated. Large postponedMisreplicatedBlocks has impact on blockReport latency Key: HDFS-6425 URL: https://issues.apache.org/jira/browse/HDFS-6425 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, HDFS-6425.patch Sometimes we have large number of over replicates when NN fails over. When the new active NN took over, over replicated blocks will be put to postponedMisreplicatedBlocks until all DNs for that block aren't stale anymore. We have a case where NNs flip flop. Before postponedMisreplicatedBlocks became empty, NN fail over again and again. So postponedMisreplicatedBlocks just kept increasing until the cluster is stable. In addition, large postponedMisreplicatedBlocks could make rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246808#comment-14246808 ] Kihwal Lee commented on HDFS-6425: -- Did you have a chance to analyze the cause of the large number of over-replication? It might be due to the race between completeFile and incremental block reports. If a file is closed with just min_replicas and the replication monitor runs before all the rest of incremental block reports are received, replication will be scheduled and this will lead to over-replication. Large postponedMisreplicatedBlocks has impact on blockReport latency Key: HDFS-6425 URL: https://issues.apache.org/jira/browse/HDFS-6425 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, HDFS-6425.patch Sometimes we have large number of over replicates when NN fails over. When the new active NN took over, over replicated blocks will be put to postponedMisreplicatedBlocks until all DNs for that block aren't stale anymore. We have a case where NNs flip flop. Before postponedMisreplicatedBlocks became empty, NN fail over again and again. So postponedMisreplicatedBlocks just kept increasing until the cluster is stable. In addition, large postponedMisreplicatedBlocks could make rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7503) Namenode restart after large deletions can cause slow processReport (due to logging)
[ https://issues.apache.org/jira/browse/HDFS-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246809#comment-14246809 ] Suresh Srinivas commented on HDFS-7503: --- A minor nit: {code} +blockLog.info(BLOCK* processReport: + b + on + node + + size + b.getNumBytes() + + does not belong to any file); {code} We can print the repetitive node information and the information that block does not belong to any file outside the for loop. Namenode restart after large deletions can cause slow processReport (due to logging) Key: HDFS-7503 URL: https://issues.apache.org/jira/browse/HDFS-7503 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.1, 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 1.3.0, 2.6.1 Attachments: HDFS-7503.branch-1.02.patch, HDFS-7503.branch-1.patch, HDFS-7503.trunk.01.patch, HDFS-7503.trunk.02.patch If a large directory is deleted and namenode is immediately restarted, there are a lot of blocks that do not belong to any file. This results in a log: {code} 2014-11-08 03:11:45,584 INFO BlockStateChange (BlockManager.java:processReport(1901)) - BLOCK* processReport: blk_1074250282_509532 on 172.31.44.17:1019 size 6 does not belong to any file. {code} This log is printed within FSNamsystem lock. This can cause namenode to take long time in coming out of safemode. One solution is to downgrade the logging level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
[ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246822#comment-14246822 ] Hari Sekhon commented on HDFS-5442: --- MapR's approach to DR is perhaps the best in the Hadoop world right now. MapR-FS takes snapshots and replicates those snapshots to the other site. When the snapshot is fully copied then it is atomically enabled at the other site. This is the best possible scenario for consistency and has worked well including in-built scheduling. So perhaps HDFS DR requires a 2 administrative options for DR depending on what is required: 1. Streaming continuous block replication (inconsistent unless you guarantee block write ordering which WANdisco does not) 2. Atomic snapshot mirroring + enabling at other site like MapR-FS I suspect number 2 will require some improvement to the HDFS snapshots to allow rolling forward a snapshot at the DR site once it's complete? Also, number 2 also allows for schedule changes, ie snap copy every 15 mins or every 1 hour or every 1 day so you only get the net changes and not every single intermediate change, which may mean less data copied (although I doubt that in practice unless people are rewriting/replacing datasets like HBase compactions). Regardless of the solution, there must be configurable path exclusions such as for /tmp and other places of intemediate data. Zero loss HDFS data replication for multiple datacenters Key: HDFS-5442 URL: https://issues.apache.org/jira/browse/HDFS-5442 Project: Hadoop HDFS Issue Type: Improvement Reporter: Avik Dey Assignee: Dian Fu Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf Hadoop is architected to operate efficiently at scale for normal hardware failures within a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is not designed for nor deployed in configurations spanning multiple datacenters, replicating data from one location to another is common practice for disaster recovery and global service availability. There are current solutions available for batch replication using data copy/export tools. However, while providing some backup capability for HDFS data, they do not provide the capability to recover all your HDFS data from a datacenter failure and be up and running again with a fully operational Hadoop cluster in another datacenter in a matter of minutes. For disaster recovery from a datacenter failure, we should provide a fully distributed, zero data loss, low latency, high throughput and secure HDFS data replication solution for multiple datacenter setup. Design and code for Phase-1 to follow soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6485) Transfer data from primary cluster to mirror cluster synchronously
[ https://issues.apache.org/jira/browse/HDFS-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246824#comment-14246824 ] Hari Sekhon commented on HDFS-6485: --- Would this be covered by HDFS-5442? Transfer data from primary cluster to mirror cluster synchronously -- Key: HDFS-6485 URL: https://issues.apache.org/jira/browse/HDFS-6485 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jiang, Wenjie Assignee: Jiang, Wenjie Attachments: HDFS-6485.patch In the sync mode of Disaster Recovery, namenode in the primary cluster will return a pipeline including datanodes both in primary and mirror clusters to DFSClient and then DFSClient will write data with the existing HDFS architecture. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
[ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246829#comment-14246829 ] Uma Maheswara Rao G commented on HDFS-5442: --- I am in business travel to China from 13th to 21nd Dec. Please allow my delayed responses during this period. Zero loss HDFS data replication for multiple datacenters Key: HDFS-5442 URL: https://issues.apache.org/jira/browse/HDFS-5442 Project: Hadoop HDFS Issue Type: Improvement Reporter: Avik Dey Assignee: Dian Fu Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf Hadoop is architected to operate efficiently at scale for normal hardware failures within a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is not designed for nor deployed in configurations spanning multiple datacenters, replicating data from one location to another is common practice for disaster recovery and global service availability. There are current solutions available for batch replication using data copy/export tools. However, while providing some backup capability for HDFS data, they do not provide the capability to recover all your HDFS data from a datacenter failure and be up and running again with a fully operational Hadoop cluster in another datacenter in a matter of minutes. For disaster recovery from a datacenter failure, we should provide a fully distributed, zero data loss, low latency, high throughput and secure HDFS data replication solution for multiple datacenter setup. Design and code for Phase-1 to follow soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
[ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246835#comment-14246835 ] Konstantin Boudnik commented on HDFS-5442: -- bq. MapR's approach to DR is perhaps the best in the Hadoop world right now. MapR-FS takes snapshots and replicates those snapshots to the other site. It's hardly the best, because the snapshots are by definition aren't real-time, so your DR side is always behind of the primary. And in case of a disastrous event you're going to loose not-yet-snapshot'ed data or data-in-flight. Zero loss HDFS data replication for multiple datacenters Key: HDFS-5442 URL: https://issues.apache.org/jira/browse/HDFS-5442 Project: Hadoop HDFS Issue Type: Improvement Reporter: Avik Dey Assignee: Dian Fu Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf Hadoop is architected to operate efficiently at scale for normal hardware failures within a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is not designed for nor deployed in configurations spanning multiple datacenters, replicating data from one location to another is common practice for disaster recovery and global service availability. There are current solutions available for batch replication using data copy/export tools. However, while providing some backup capability for HDFS data, they do not provide the capability to recover all your HDFS data from a datacenter failure and be up and running again with a fully operational Hadoop cluster in another datacenter in a matter of minutes. For disaster recovery from a datacenter failure, we should provide a fully distributed, zero data loss, low latency, high throughput and secure HDFS data replication solution for multiple datacenter setup. Design and code for Phase-1 to follow soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7524) TestRetryCacheWithHA.testUpdatePipeline failed in trunk
Yongjun Zhang created HDFS-7524: --- Summary: TestRetryCacheWithHA.testUpdatePipeline failed in trunk Key: HDFS-7524 URL: https://issues.apache.org/jira/browse/HDFS-7524 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ Error Message {quote} After waiting the operation updatePipeline still has not taken effect on NN yet Stacktrace java.lang.AssertionError: After waiting the operation updatePipeline still has not taken effect on NN yet at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect failed in trunk
Yongjun Zhang created HDFS-7525: --- Summary: TestDatanodeManager.testNumVersionsReportedCorrect failed in trunk Key: HDFS-7525 URL: https://issues.apache.org/jira/browse/HDFS-7525 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 Stacktrace java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7524) TestRetryCacheWithHA.testUpdatePipeline failed occasionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7524: Summary: TestRetryCacheWithHA.testUpdatePipeline failed occasionally in trunk (was: TestRetryCacheWithHA.testUpdatePipeline failed in trunk) TestRetryCacheWithHA.testUpdatePipeline failed occasionally in trunk Key: HDFS-7524 URL: https://issues.apache.org/jira/browse/HDFS-7524 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ Error Message {quote} After waiting the operation updatePipeline still has not taken effect on NN yet Stacktrace java.lang.AssertionError: After waiting the operation updatePipeline still has not taken effect on NN yet at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7526) SetReplication OutOfMemoryError
Philipp Schuegerl created HDFS-7526: --- Summary: SetReplication OutOfMemoryError Key: HDFS-7526 URL: https://issues.apache.org/jira/browse/HDFS-7526 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Philipp Schuegerl Setting the replication of a HDFS folder recursively can run out of memory. E.g. with a large /var/log directory: hdfs dfs -setrep -R -w 1 /var/log Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:2694) at java.lang.String.init(String.java:203) at java.lang.String.substring(String.java:1913) at java.net.URI$Parser.substring(URI.java:2850) at java.net.URI$Parser.parse(URI.java:3046) at java.net.URI.init(URI.java:753) at org.apache.hadoop.fs.Path.initialize(Path.java:203) at org.apache.hadoop.fs.Path.init(Path.java:116) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222) at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7525: Summary: TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk (was: TestDatanodeManager.testNumVersionsReportedCorrect failed in trunk) TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk --- Key: HDFS-7525 URL: https://issues.apache.org/jira/browse/HDFS-7525 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 Stacktrace java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7524) TestRetryCacheWithHA.testUpdatePipeline fails occasionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7524: Summary: TestRetryCacheWithHA.testUpdatePipeline fails occasionally in trunk (was: TestRetryCacheWithHA.testUpdatePipeline failed occasionally in trunk) TestRetryCacheWithHA.testUpdatePipeline fails occasionally in trunk --- Key: HDFS-7524 URL: https://issues.apache.org/jira/browse/HDFS-7524 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ Error Message {quote} After waiting the operation updatePipeline still has not taken effect on NN yet Stacktrace java.lang.AssertionError: After waiting the operation updatePipeline still has not taken effect on NN yet at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testClientRetryWithFailover(TestRetryCacheWithHA.java:1278) at org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline(TestRetryCacheWithHA.java:1176) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
[ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246862#comment-14246862 ] Hari Sekhon commented on HDFS-5442: --- Zero-loss is practically impossible unless you do synchronous high latency writes to both sites, so neither WANdisco or MapR can claim zero-loss while still performing well, and I've had significant unmanageable streaming async block replication lag greater than several dozen minutes (ie significant potential data loss) when using a well known HDFS proprietary add-on... With atomic snapshot mirroring mode you will at least know what you have consistent to what point in time and can work with that, rather than having to fsck to find out what data has random holes in it from blocks that haven't made it across the random low priority replication. For option 1 it would be better if block write ordering could be maintained and replayed at the other site in the same order for chronological consistency up to the latest DR checkpoint in case any non-trivial application sitting on top of the filesystem isn't prepared for having holes in it's data.. eg for WAL logs or distributed SQL DBs redo logs sitting on top of HDFS (some solutions might do their own replication in which case that should be excluded via my previously mentioned configurable path exclusions). The final thing that HDFS DR should have is administrative active foreground block repair for off-peak times to catch up faster by maxing out the bandwidth (or the max bandwidth settings you've specified). Ultimately both option 1 and option 2 should be provided since each is better for different use cases. Option 2 has been done very well by MapR, Option 1 hasn't been done perfectly by anyone I've see yet but I'm very eager for this to be done (anyone at Hortonworks reading this??? ;) ). Zero loss HDFS data replication for multiple datacenters Key: HDFS-5442 URL: https://issues.apache.org/jira/browse/HDFS-5442 Project: Hadoop HDFS Issue Type: Improvement Reporter: Avik Dey Assignee: Dian Fu Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf Hadoop is architected to operate efficiently at scale for normal hardware failures within a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is not designed for nor deployed in configurations spanning multiple datacenters, replicating data from one location to another is common practice for disaster recovery and global service availability. There are current solutions available for batch replication using data copy/export tools. However, while providing some backup capability for HDFS data, they do not provide the capability to recover all your HDFS data from a datacenter failure and be up and running again with a fully operational Hadoop cluster in another datacenter in a matter of minutes. For disaster recovery from a datacenter failure, we should provide a fully distributed, zero data loss, low latency, high throughput and secure HDFS data replication solution for multiple datacenter setup. Design and code for Phase-1 to follow soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7526) SetReplication OutOfMemoryError
[ https://issues.apache.org/jira/browse/HDFS-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246864#comment-14246864 ] Kihwal Lee commented on HDFS-7526: -- Does the recursively listing the same directory also cause OOM? If it does, it is a known issue and until we fix FsShell to use the new remote iterator-based API, it will continue to be a problem. SetReplication OutOfMemoryError --- Key: HDFS-7526 URL: https://issues.apache.org/jira/browse/HDFS-7526 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Philipp Schuegerl Setting the replication of a HDFS folder recursively can run out of memory. E.g. with a large /var/log directory: hdfs dfs -setrep -R -w 1 /var/log Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:2694) at java.lang.String.init(String.java:203) at java.lang.String.substring(String.java:1913) at java.net.URI$Parser.substring(URI.java:2850) at java.net.URI$Parser.parse(URI.java:3046) at java.net.URI.init(URI.java:753) at org.apache.hadoop.fs.Path.initialize(Path.java:203) at org.apache.hadoop.fs.Path.init(Path.java:116) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222) at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
Yongjun Zhang created HDFS-7527: --- Summary: TestDecommission.testIncludeByRegistrationName fails occassionally in trunk Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2:
[jira] [Updated] (HDFS-7513) HDFS inotify: add defaultBlockSize to CreateEvent
[ https://issues.apache.org/jira/browse/HDFS-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7513: --- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) HDFS inotify: add defaultBlockSize to CreateEvent - Key: HDFS-7513 URL: https://issues.apache.org/jira/browse/HDFS-7513 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7513.001.patch, HDFS-7513.002.patch, HDFS-7513.003.patch HDFS inotify: add defaultBlockSize to CreateEvent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7506) Consolidate implementation of setting inode attributes into a single class
[ https://issues.apache.org/jira/browse/HDFS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246975#comment-14246975 ] Jing Zhao commented on HDFS-7506: - +1 Consolidate implementation of setting inode attributes into a single class -- Key: HDFS-7506 URL: https://issues.apache.org/jira/browse/HDFS-7506 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7506.000.patch, HDFS-7506.001.patch, HDFS-7506.001.patch, HDFS-7506.002.patch, HDFS-7506.003.patch This jira proposes to consolidate the implementation of setting inode attributes (i.e., times, permissions, owner, etc.) to a single class for better maintainability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2256) we should add a wait for non-safe mode and call dfsadmin -report in start-dfs
[ https://issues.apache.org/jira/browse/HDFS-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-2256. Resolution: Won't Fix Closing this as won't fix. we should add a wait for non-safe mode and call dfsadmin -report in start-dfs - Key: HDFS-2256 URL: https://issues.apache.org/jira/browse/HDFS-2256 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Owen O'Malley Assignee: Owen O'Malley I think we should add a call to wait for safe mode exit and print the dfs report to show upgrades that are in progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hdfs-nfs project
[ https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7516: - Attachment: HDFS-7516.002.patch Uploaded a new patch to address Haohui's comment. Fix findbugs warnings in hdfs-nfs project - Key: HDFS-7516 URL: https://issues.apache.org/jira/browse/HDFS-7516 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-6239) start-dfs.sh does not start remote DataNode due to escape characters
[ https://issues.apache.org/jira/browse/HDFS-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-6239. Resolution: Won't Fix Hadoop 1.x is dead and trunk/3.x has completely different code for this now. Closing as won't fix. start-dfs.sh does not start remote DataNode due to escape characters Key: HDFS-6239 URL: https://issues.apache.org/jira/browse/HDFS-6239 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 1.2.1 Environment: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu) Linux foo 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 x86_64 x86_64 x86_64 GNU/Linux AFS file system. Reporter: xyzzy start-dfs.sh fails to start remote data nodes and task nodes, though it is possible to start them manually through hadoop-daemon.sh. I've been able to debug and find the root cause the bug, and I thought it was a trivial fix, but I do not know how to do it. Can't figure out a way to handle this seemingly trivial bug. hadoop-daemons.sh calls slave.sh: exec $bin/slaves.sh --config $HADOOP_CONF_DIR cd $HADOOP_HOME \; $bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR $@ This is the issue when I debug using bash -x: In slaves.sh, the \; becomes ';' + ssh .xx..xxx cd /afs/xx..xxx/x/x/x/xx/x/libexec/.. ';' /afs/xx..xxx/x/x/x/xx//bin/hadoop-daemon.sh --config /afs/xx..xxx/x/x/x/xx//libexec/../conf start datanode The problem is ';' . Because the semi-colon is surrounded by quotes, it doesn't execute the code after that. I manually ran the above command, and as expected the data node did not start. When I removed the quotes around the semi-colon, everything works. Please note that you can see the issue only when you do bash -x. If you echo the statement, the quotes around the semi-colon are not visible. This issue is always reproducible for me, and because of it, I have to manually start daemons on each machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2628) Remove Mapred filenames from HDFS findbugsExcludeFile.xml file
[ https://issues.apache.org/jira/browse/HDFS-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-2628: --- Component/s: (was: scripts) test Remove Mapred filenames from HDFS findbugsExcludeFile.xml file -- Key: HDFS-2628 URL: https://issues.apache.org/jira/browse/HDFS-2628 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Uma Maheswara Rao G Priority: Minor Mapreduce filesnames are there in hadoop-hdfs-project\hadoop-hdfs\dev-support\findbugsExcludeFile.xml is it intentional? i think we should remove them from HDFS. Exampl: {code} !-- Ignore warnings where child class has the same name as super class. Classes based on Old API shadow names from new API. Should go off after HADOOP-1.0 -- Match Class name=~org.apache.hadoop.mapred.* / Bug pattern=NM_SAME_SIMPLE_NAME_AS_SUPERCLASS / /Match {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7513) HDFS inotify: add defaultBlockSize to CreateEvent
[ https://issues.apache.org/jira/browse/HDFS-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247000#comment-14247000 ] Hudson commented on HDFS-7513: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6718 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6718/]) HDFS-7513. HDFS inotify: add defaultBlockSize to CreateEvent (cmccabe) (cmccabe: rev 6e13fc62e1f284f22fd0089f06ce281198bc7c2a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/inotify/Event.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/InotifyFSEditLogOpTranslator.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/inotify.proto HDFS inotify: add defaultBlockSize to CreateEvent - Key: HDFS-7513 URL: https://issues.apache.org/jira/browse/HDFS-7513 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7513.001.patch, HDFS-7513.002.patch, HDFS-7513.003.patch HDFS inotify: add defaultBlockSize to CreateEvent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-4063) Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.
[ https://issues.apache.org/jira/browse/HDFS-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-4063. Resolution: Won't Fix Closing this as won't fix. Hadoop 1.x is dead and this code has been removed from modern versions of Hadoop. Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script. Key: HDFS-4063 URL: https://issues.apache.org/jira/browse/HDFS-4063 Project: Hadoop HDFS Issue Type: Bug Components: scripts, tools Affects Versions: 1.0.3, 1.1.0, 2.0.2-alpha Environment: Fedora 17 3.3.4-5.fc17.x86_64t, java version 1.7.0_06-icedtea, Rackspace Cloud (NextGen) Reporter: Haoquan Wang Priority: Minor Labels: patch Original Estimate: 1h Remaining Estimate: 1h The JAVA_HOME directory remains unchanged no matter what you enter when you run hadoop-setup-conf.sh to generate hadoop configurations. Please see below example: * [root@hadoop-slave ~]# /sbin/hadoop-setup-conf.sh Setup Hadoop Configuration Where would you like to put config directory? (/etc/hadoop) Where would you like to put log directory? (/var/log/hadoop) Where would you like to put pid directory? (/var/run/hadoop) What is the host of the namenode? (hadoop-slave) Where would you like to put namenode data directory? (/var/lib/hadoop/hdfs/namenode) Where would you like to put datanode data directory? (/var/lib/hadoop/hdfs/datanode) What is the host of the jobtracker? (hadoop-slave) Where would you like to put jobtracker/tasktracker data directory? (/var/lib/hadoop/mapred) Where is JAVA_HOME directory? (/usr/java/default) *+/usr/lib/jvm/jre+* Would you like to create directories/copy conf files to localhost? (Y/n) Review your choices: Config directory: /etc/hadoop Log directory : /var/log/hadoop PID directory : /var/run/hadoop Namenode host : hadoop-slave Namenode directory : /var/lib/hadoop/hdfs/namenode Datanode directory : /var/lib/hadoop/hdfs/datanode Jobtracker host : hadoop-slave Mapreduce directory : /var/lib/hadoop/mapred Task scheduler : org.apache.hadoop.mapred.JobQueueTaskScheduler JAVA_HOME directory : *+/usr/java/default+* Create dirs/copy conf files : y Proceed with generate configuration? (y/N) n User aborted setup, exiting... * Resolution: Amend line 509 in file /sbin/hadoop-setup-conf.sh from: JAVA_HOME=${USER_USER_JAVA_HOME:-$JAVA_HOME} to: JAVA_HOME=${USER_JAVA_HOME:-$JAVA_HOME} will resolve this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7506) Consolidate implementation of setting inode attributes into a single class
[ https://issues.apache.org/jira/browse/HDFS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7506: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks for the review. Consolidate implementation of setting inode attributes into a single class -- Key: HDFS-7506 URL: https://issues.apache.org/jira/browse/HDFS-7506 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7506.000.patch, HDFS-7506.001.patch, HDFS-7506.001.patch, HDFS-7506.002.patch, HDFS-7506.003.patch This jira proposes to consolidate the implementation of setting inode attributes (i.e., times, permissions, owner, etc.) to a single class for better maintainability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7506) Consolidate implementation of setting inode attributes into a single class
[ https://issues.apache.org/jira/browse/HDFS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247023#comment-14247023 ] Hudson commented on HDFS-7506: -- FAILURE: Integrated in Hadoop-trunk-Commit #6719 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6719/]) HDFS-7506. Consolidate implementation of setting inode attributes into a single class. Contributed by Haohui Mai. (wheat9: rev 832ebd8cb63d91b4aa4bfed412b9799b3b9be4a7) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirStatAndListingOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java Consolidate implementation of setting inode attributes into a single class -- Key: HDFS-7506 URL: https://issues.apache.org/jira/browse/HDFS-7506 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7506.000.patch, HDFS-7506.001.patch, HDFS-7506.001.patch, HDFS-7506.002.patch, HDFS-7506.003.patch This jira proposes to consolidate the implementation of setting inode attributes (i.e., times, permissions, owner, etc.) to a single class for better maintainability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7516) Fix findbugs warnings in hdfs-nfs project
[ https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247070#comment-14247070 ] Hadoop QA commented on HDFS-7516: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687278/HDFS-7516.002.patch against trunk revision 6e13fc6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-nfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9038//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9038//console This message is automatically generated. Fix findbugs warnings in hdfs-nfs project - Key: HDFS-7516 URL: https://issues.apache.org/jira/browse/HDFS-7516 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project
[ https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7516: - Summary: Fix findbugs warnings in hadoop-nfs project (was: Fix findbugs warnings in hdfs-nfs project) Fix findbugs warnings in hadoop-nfs project --- Key: HDFS-7516 URL: https://issues.apache.org/jira/browse/HDFS-7516 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project
[ https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7516: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Fix findbugs warnings in hadoop-nfs project --- Key: HDFS-7516 URL: https://issues.apache.org/jira/browse/HDFS-7516 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project
[ https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247094#comment-14247094 ] Hudson commented on HDFS-7516: -- ABORTED: Integrated in Hadoop-trunk-Commit #6721 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6721/]) HDFS-7516. Fix findbugs warnings in hdfs-nfs project. Contributed by Brandon Li (brandonli: rev 42d8858c5d237c4d9ab439c570a17b7fcaf781c2) * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/CredentialsSys.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/SYMLINK3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/LOOKUP3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/RMDIR3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/LINK3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountResponse.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/CREATE3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/MKDIR3Request.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/REMOVE3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/RENAME3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/MKNOD3Request.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/FileHandle.java Fix findbugs warnings in hadoop-nfs project --- Key: HDFS-7516 URL: https://issues.apache.org/jira/browse/HDFS-7516 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.7.0 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project
[ https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7516: - Fix Version/s: 2.7.0 Fix findbugs warnings in hadoop-nfs project --- Key: HDFS-7516 URL: https://issues.apache.org/jira/browse/HDFS-7516 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.7.0 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7516) Fix findbugs warnings in hadoop-nfs project
[ https://issues.apache.org/jira/browse/HDFS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247097#comment-14247097 ] Brandon Li commented on HDFS-7516: -- Thank you, [~wheat9], for the review. I've committed the patch. Fix findbugs warnings in hadoop-nfs project --- Key: HDFS-7516 URL: https://issues.apache.org/jira/browse/HDFS-7516 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.7.0 Attachments: HDFS-7516.001.patch, HDFS-7516.002.patch, findbugsXml.xml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7023: --- Fix Version/s: HDFS-6994 Target Version/s: HDFS-6994 Affects Version/s: HDFS-6994 Status: Patch Available (was: In Progress) Committed to branch, thanks for the review. use libexpat instead of libxml2 for libhdfs3 Key: HDFS-7023 URL: https://issues.apache.org/jira/browse/HDFS-7023 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS-6994 Reporter: Zhanwei Wang Assignee: Colin Patrick McCabe Fix For: HDFS-6994 Attachments: HDFS-7023-pnative.002.patch, HDFS-7023.001.pnative.patch As commented in HDFS-6994, libxml2 may has some thread safe issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247217#comment-14247217 ] Hadoop QA commented on HDFS-7023: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683901/HDFS-7023-pnative.002.patch against trunk revision e597249. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9039//console This message is automatically generated. use libexpat instead of libxml2 for libhdfs3 Key: HDFS-7023 URL: https://issues.apache.org/jira/browse/HDFS-7023 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS-6994 Reporter: Zhanwei Wang Assignee: Colin Patrick McCabe Fix For: HDFS-6994 Attachments: HDFS-7023-pnative.002.patch, HDFS-7023.001.pnative.patch As commented in HDFS-6994, libxml2 may has some thread safe issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7411: -- Attachment: hdfs-7411.005.patch Thanks again Arpit for commenting. The TestOpenFilesWithSnapshot failure was caused by the block report change, some digging shows the NN won't come out of startup safemode. I'm going to defer this fix until later, it's unrelated to the decom manager refactor itself. New patch attached without the block report change, but plus test improvements. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247283#comment-14247283 ] Lei (Eddy) Xu commented on HDFS-6440: - Hey, [~jesse_yates] Thanks for your answers! I have a few further questions regarding the patch: 1. I did not see where {{isPrimearyCheckPointer}} is set to {{false}}. {code:title=StandbyCheckpointer.java} private boolean isPrimaryCheckPointer = true; ... if (upload.get() == TransferFsImage.TransferResult.SUCCESS) { this.isPrimaryCheckPointer = true; //avoid getting the rest of the results - we don't care since we had a successful upload break; } {code} I guess the default value of {{isPrimaryCheckPointer}} might be a typo, which should be {{false}}. Moreover, is there a case thatSNN switches from primary check pointer to non-primary check pointer? 2. Is the following condition correct? I think only {{sendRequest}} is needed. {code:title=StandbyCheckpointer.java} if (needCheckpoint sendRequest) { {code} Also in the old code, {code} } else if (secsSinceLast = checkpointConf.getPeriod()) { LOG.info(Triggering checkpoint because it has been + secsSinceLast + seconds since the last checkpoint, which + exceeds the configured interval + checkpointConf.getPeriod()); needCheckpoint = true; } {code} Does it implies that if {{secsSinceLast = checkpointConf.getPeriod()}} is {{true}} then {{secsSinceLast = checkpointConf.getQuietPeriod()}} is always {{true}}, for default {{quite multiplier}} value? If it is the case, are these duplicated conditions? It looks like that it might be easier to let ANN calculate the above conditions, as it has the actual system-wide knowledge of last upload and last txnid. It could be a nice optimization later. 3. When it uploads fsimage, are {{SC_CONFLICT}} and {{SC_EXPECTATION_FAILED}} not handled in the SNN in the current patch? Do you plan to handle them in a following patch? 4. Could you set {{EditLogTailer#maxRetries}} to {{private final}}? Do we need to enforce an acceptable value range for {{maxRetries}}? For instance, in the following code, it would not try every NN when {{nextNN = nns.size() - 1}} and {{maxRetries = 1}} {code} // if we have reached the max loop count, quit by returning null if (nextNN / nns.size() = maxRetries) { return null; } {code} 5. There are a few changes due to format, e.g., in {{doCheckpointing()}}. Could you remove them to reduce the size of the patch? Also the following code is indented incorrectly. {code} int i = 0; for (; i uploads.size(); i++) { FutureTransferFsImage.TransferResult upload = uploads.get(i); try { // TODO should there be some smarts here about retries nodes that are not the active NN? if (upload.get() == TransferFsImage.TransferResult.SUCCESS) { this.isPrimaryCheckPointer = true; //avoid getting the rest of the results - we don't care since we had a successful upload break; } } catch (ExecutionException e) { ioe = new IOException(Exception during image upload: + e.getMessage(), e.getCause()); break; } catch (InterruptedException e) { ie = null; break; } } {code} Other parts LGTM. Thanks again for working on this, [~jesse_yates]! Support more than 2 NameNodes - Key: HDFS-6440 URL: https://issues.apache.org/jira/browse/HDFS-6440 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha, namenode Affects Versions: 2.4.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch Most of the work is already done to support more than 2 NameNodes (one active, one standby). This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys should be available for fail-over. Mostly, this is a matter of updating how we parse configurations, some complexity around managing the checkpointing, and updating a whole lot of tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7503) Namenode restart after large deletions can cause slow processReport (due to logging)
[ https://issues.apache.org/jira/browse/HDFS-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247285#comment-14247285 ] Arpit Agarwal commented on HDFS-7503: - Hi Suresh, that may cause the output from multiple threads to get interleaved since we're not synchronized any more and make it difficult to parse. Namenode restart after large deletions can cause slow processReport (due to logging) Key: HDFS-7503 URL: https://issues.apache.org/jira/browse/HDFS-7503 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.1, 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 1.3.0, 2.6.1 Attachments: HDFS-7503.branch-1.02.patch, HDFS-7503.branch-1.patch, HDFS-7503.trunk.01.patch, HDFS-7503.trunk.02.patch If a large directory is deleted and namenode is immediately restarted, there are a lot of blocks that do not belong to any file. This results in a log: {code} 2014-11-08 03:11:45,584 INFO BlockStateChange (BlockManager.java:processReport(1901)) - BLOCK* processReport: blk_1074250282_509532 on 172.31.44.17:1019 size 6 does not belong to any file. {code} This log is printed within FSNamsystem lock. This can cause namenode to take long time in coming out of safemode. One solution is to downgrade the logging level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()
[ https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7484: - Attachment: HDFS-7484.002.patch Simplify the workflow of calculating permission in mkdirs() --- Key: HDFS-7484 URL: https://issues.apache.org/jira/browse/HDFS-7484 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, HDFS-7484.002.patch {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions based on whether {{inheritPermission}} is true. This jira proposes to simplify the workflow and make it explicit for the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()
[ https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247354#comment-14247354 ] Hadoop QA commented on HDFS-7484: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687316/HDFS-7484.002.patch against trunk revision a095622. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9041//console This message is automatically generated. Simplify the workflow of calculating permission in mkdirs() --- Key: HDFS-7484 URL: https://issues.apache.org/jira/browse/HDFS-7484 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, HDFS-7484.002.patch {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions based on whether {{inheritPermission}} is true. This jira proposes to simplify the workflow and make it explicit for the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()
[ https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7484: - Attachment: HDFS-7484.003.patch Simplify the workflow of calculating permission in mkdirs() --- Key: HDFS-7484 URL: https://issues.apache.org/jira/browse/HDFS-7484 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, HDFS-7484.002.patch, HDFS-7484.003.patch {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions based on whether {{inheritPermission}} is true. This jira proposes to simplify the workflow and make it explicit for the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247407#comment-14247407 ] Daryn Sharp commented on HDFS-7435: --- Apologies for delay, I've been dealing with production issues, including block reports which are becoming a very big issue. I've studied the way PB is implemented and I'm not sure fragmenting the buffer will add value. Proper encoding (mine is buggy) will not allocate a full buffer but uses a tree to hold fragments of the byte string. Decoding the PB will use the full byte array of the PB as the backing store for slices (ref to full array + offset/length). We've already paid the price for a large allocation of the full PB, carried it around in the Call object, etc, so extracting the field is essentially free. Whether it's one or more is irrelevant. I'm trying to performance test a patch that internally segments the BlockListAsLongs and correctly outputs the byte buffer. PB encoding of block reports is very inefficient Key: HDFS-7435 URL: https://issues.apache.org/jira/browse/HDFS-7435 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch Block reports are encoded as a PB repeating long. Repeating fields use an {{ArrayList}} with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7528) Consolidate symlink-related implementation into a single class
Haohui Mai created HDFS-7528: Summary: Consolidate symlink-related implementation into a single class Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7528: - Status: Patch Available (was: Open) Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7528: - Attachment: HDFS-7528.000.patch Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7528: - Issue Type: Sub-task (was: Bug) Parent: HDFS-7416 Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking
[ https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247475#comment-14247475 ] Arpit Agarwal commented on HDFS-6919: - Colin, do you plan to address this for 2.7? Enforce a single limit for RAM disk usage and replicas cached via locking - Key: HDFS-6919 URL: https://issues.apache.org/jira/browse/HDFS-6919 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Arpit Agarwal Assignee: Colin Patrick McCabe Priority: Blocker The DataNode can have a single limit for memory usage which applies to both replicas cached via CCM and replicas on RAM disk. See comments [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025], [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245] and [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575] for discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6425: -- Attachment: HDFS-6425-3.patch Thanks, Kihwal. Here is the updated patch for trunk based on a slightly different version. In rescanPostponedMisreplicatedBlocks, instead of always picking the first blocksPerRescan blocks, the new version randomly selects blocksPerRescan consecutive blocks. This is to handle the case if for some reason some datanodes remain in content stale state for a long time and only impact the first blocksPerRescan blocks. This new version has been running on our production clusters for couple months. Regarding the root cause of over replication. We did some analysis a while back. It could be due to the IBR scenario you mentioned. There are also other sources. 1. Load balancer could create spike of over replication in our clusters. 2. As part of machine repair process, we used to bring unformatted machines back the cluster. 3. It appears right after NN startup and leave safe mode but before all DNs send blockreport, NN will consider some blocks under replicated and start replication process. Later after the remaining DNs send blockreport, NN will get into over replicated situation. Large postponedMisreplicatedBlocks has impact on blockReport latency Key: HDFS-6425 URL: https://issues.apache.org/jira/browse/HDFS-6425 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, HDFS-6425-Test-Case.pdf, HDFS-6425.patch Sometimes we have large number of over replicates when NN fails over. When the new active NN took over, over replicated blocks will be put to postponedMisreplicatedBlocks until all DNs for that block aren't stale anymore. We have a case where NNs flip flop. Before postponedMisreplicatedBlocks became empty, NN fail over again and again. So postponedMisreplicatedBlocks just kept increasing until the cluster is stable. In addition, large postponedMisreplicatedBlocks could make rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247540#comment-14247540 ] Hadoop QA commented on HDFS-7411: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687303/hdfs-7411.005.patch against trunk revision e597249. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9040//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9040//console This message is automatically generated. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247558#comment-14247558 ] Andrew Wang commented on HDFS-7411: --- This is a first, Jenkins says everything is +1, but gives a -1 overall. Buildbot must have it out for me :) Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247563#comment-14247563 ] Ravi Prakash commented on HDFS-7411: Lolz! I'm wondering if the jenkins jobs are running in isolated workspaces at all. It'd explain a lot Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()
[ https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-7484: --- Assignee: Jing Zhao (was: Haohui Mai) Simplify the workflow of calculating permission in mkdirs() --- Key: HDFS-7484 URL: https://issues.apache.org/jira/browse/HDFS-7484 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, HDFS-7484.002.patch, HDFS-7484.003.patch {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions based on whether {{inheritPermission}} is true. This jira proposes to simplify the workflow and make it explicit for the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()
[ https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7484: Attachment: HDFS-7484.004.patch Continue the work of Haohui. Based on the 003 patch, add a new method {{INodesInPath#getExistingINodes}} to return all the existing INodes of a given INodesInPath instance (i.e., trimming all the null elements), and use this method to simplify the mkdir logic. Simplify the workflow of calculating permission in mkdirs() --- Key: HDFS-7484 URL: https://issues.apache.org/jira/browse/HDFS-7484 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-7484.000.patch, HDFS-7484.001.patch, HDFS-7484.002.patch, HDFS-7484.003.patch, HDFS-7484.004.patch {{FSDirMkdirsOp#mkdirsRecursively()}} currently calculates the permissions based on whether {{inheritPermission}} is true. This jira proposes to simplify the workflow and make it explicit for the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7529) Consolidate encryption zone related implementation into a single class
Haohui Mai created HDFS-7529: Summary: Consolidate encryption zone related implementation into a single class Key: HDFS-7529 URL: https://issues.apache.org/jira/browse/HDFS-7529 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai This jira proposes to consolidate encryption zone related implementation to a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247622#comment-14247622 ] Andrew Wang commented on HDFS-6673: --- Hey Eddy, thanks for working on this. It looks good, I don't see any major issues, just some small nits: * Needs a small rebase * DelimitedImageViewer class javadoc typos: included - included in, can be - can be set * TextWriterImageViewer javadoc: IOExceptions - IOException * We need some unit tests. I bet we have some for the old image viewer which could be revived. Since the purpose of this is all about large fsimages, have you tested this with a large FSImage, and checked the memory usage / performance? Also curious how we tune the LevelDB caches and write buffers, as described here: https://code.google.com/p/leveldb/source/browse/include/leveldb/options.h I think since LevelDB does its own write caching, we could also remove that TODO about write batching. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7530) Allow renaming an Encryption Zone root
Charles Lamb created HDFS-7530: -- Summary: Allow renaming an Encryption Zone root Key: HDFS-7530 URL: https://issues.apache.org/jira/browse/HDFS-7530 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor It should be possible to do hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6919) Enforce a single limit for RAM disk usage and replicas cached via locking
[ https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247657#comment-14247657 ] Colin Patrick McCabe commented on HDFS-6919: I'm not sure if I will have time for this in the 2.7 timeframe. If you'd like to take this jira then please do Enforce a single limit for RAM disk usage and replicas cached via locking - Key: HDFS-6919 URL: https://issues.apache.org/jira/browse/HDFS-6919 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Arpit Agarwal Assignee: Colin Patrick McCabe Priority: Blocker The DataNode can have a single limit for memory usage which applies to both replicas cached via CCM and replicas on RAM disk. See comments [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025], [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245] and [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575] for discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7484) Simplify the workflow of calculating permission in mkdirs()
[ https://issues.apache.org/jira/browse/HDFS-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247671#comment-14247671 ] Hadoop QA commented on HDFS-7484: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687318/HDFS-7484.003.patch against trunk revision a095622. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting org.apache.hadoop.hdfs.TestSafeMode org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader org.apache.hadoop.hdfs.qjournal.TestNNWithQJM org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles org.apache.hadoop.hdfs.TestFileAppendRestart org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade org.apache.hadoop.hdfs.server.namenode.ha.TestHAFsck org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS org.apache.hadoop.hdfs.qjournal.TestSecureNNWithQJM org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat org.apache.hadoop.hdfs.TestSetTimes org.apache.hadoop.hdfs.server.namenode.TestAddBlock org.apache.hadoop.hdfs.TestPersistBlocks org.apache.hadoop.hdfs.TestEncryptedTransfer org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.security.TestPermissionSymlinks org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots org.apache.hadoop.hdfs.TestDFSClientFailover org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.server.namenode.TestFSImage org.apache.hadoop.hdfs.server.namenode.TestCreateEditsLog org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot org.apache.hadoop.hdfs.TestRollingUpgradeDowngrade org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion org.apache.hadoop.hdfs.TestBlockStoragePolicy org.apache.hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite org.apache.hadoop.hdfs.server.namenode.TestEditLogRace org.apache.hadoop.hdfs.TestDFSUpgrade
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247680#comment-14247680 ] Hadoop QA commented on HDFS-7528: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687344/HDFS-7528.000.patch against trunk revision a095622. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1221 javac compiler warnings (more than the trunk's current 1 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 49 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/9043//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9043//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9043//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9043//console This message is automatically generated. Consolidate symlink-related implementation into a single class -- Key: HDFS-7528 URL: https://issues.apache.org/jira/browse/HDFS-7528 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7528.000.patch The jira proposes to consolidate symlink-related implementation into a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247695#comment-14247695 ] Hadoop QA commented on HDFS-6425: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687355/HDFS-6425-3.patch against trunk revision a095622. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9044//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9044//console This message is automatically generated. Large postponedMisreplicatedBlocks has impact on blockReport latency Key: HDFS-6425 URL: https://issues.apache.org/jira/browse/HDFS-6425 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, HDFS-6425-Test-Case.pdf, HDFS-6425.patch Sometimes we have large number of over replicates when NN fails over. When the new active NN took over, over replicated blocks will be put to postponedMisreplicatedBlocks until all DNs for that block aren't stale anymore. We have a case where NNs flip flop. Before postponedMisreplicatedBlocks became empty, NN fail over again and again. So postponedMisreplicatedBlocks just kept increasing until the cluster is stable. In addition, large postponedMisreplicatedBlocks could make rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization
[ https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247716#comment-14247716 ] Hadoop QA commented on HDFS-7494: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686991/hdfs-7494-002.patch against trunk revision a095622. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9045//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9045//console This message is automatically generated. Checking of closed in DFSInputStream#pread() should be protected by synchronization --- Key: HDFS-7494 URL: https://issues.apache.org/jira/browse/HDFS-7494 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch {code} private int pread(long position, byte[] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { {code} Checking of closed should be protected by holding lock on DFSInputStream.this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7531) Improve the concurrent access of FsVolumeList
Lei (Eddy) Xu created HDFS-7531: --- Summary: Improve the concurrent access of FsVolumeList Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7531) Improve the concurrent access of FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-7531: Attachment: HDFS-7531.000.patch This patch changes {{FsVolumeList#volumes}} from {{volatile ListFsVolumeImpl}} to {{AtomicReferenceFsVolumeImpl[]}}. Improve the concurrent access of FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7531.000.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7531) Improve the concurrent access of FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-7531: Status: Patch Available (was: Open) Improve the concurrent access of FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7531.000.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-7531: Summary: Improve the concurrent access on FsVolumeList (was: Improve the concurrent access of FsVolumeList) Improve the concurrent access on FsVolumeList - Key: HDFS-7531 URL: https://issues.apache.org/jira/browse/HDFS-7531 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7531.000.patch {{FsVolumeList}} uses {{synchronized}} to protect the update on {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, {{getAvailable()}}) iterate {{volumes}} without protection. This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247834#comment-14247834 ] Haohui Mai commented on HDFS-6673: -- I'm unsure why this requires sorting and LevelDB. Are there any particular use case behind it? Since the delimited output depends heavily on the internal implementation details of the fsimage, it makes more sense to just output as-is, going through everything using O(1) space. Even sorting is a required, using an external sorting tool like {{sort}} is much more efficient than {{LevelDB}}. On the other hand, I can see quite a bit of value of LevelDB-based output -- if you really want to proceed this, maybe we can separate it into another jira? Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7529: - Status: Patch Available (was: Open) Consolidate encryption zone related implementation into a single class -- Key: HDFS-7529 URL: https://issues.apache.org/jira/browse/HDFS-7529 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7529.000.patch This jira proposes to consolidate encryption zone related implementation to a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7529: - Attachment: HDFS-7529.000.patch Consolidate encryption zone related implementation into a single class -- Key: HDFS-7529 URL: https://issues.apache.org/jira/browse/HDFS-7529 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7529.000.patch This jira proposes to consolidate encryption zone related implementation to a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247847#comment-14247847 ] Lei (Eddy) Xu commented on HDFS-6673: - [~andrew.wang] and [~wheat9] Thanks for your comments and reviews. bq. I'm unsure why this requires sorting and LevelDB. The reason I'd like to sorting is that we need to build the namespace to get the full path of each file, while in the Inode there is only filename. [~andrew.wang] I will add a few unit tests to it. I have not yet run on large fsimages and I am working on it. Your other comments will be addressed in the meantime. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7523) Setting a socket receive buffer size in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-7523: Status: Patch Available (was: Open) Setting a socket receive buffer size in DFSClient - Key: HDFS-7523 URL: https://issues.apache.org/jira/browse/HDFS-7523 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-7523-001.txt It would be nice if we have a socket receive buffer size while creating socket from client(HBase) view, in old version it should be in DFSInputStream, in trunk it seems should be at: {code} @Override // RemotePeerFactory public Peer newConnectedPeer(InetSocketAddress addr, TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId) throws IOException { Peer peer = null; boolean success = false; Socket sock = null; try { sock = socketFactory.createSocket(); NetUtils.connect(sock, addr, getRandomLocalInterfaceAddr(), dfsClientConf.socketTimeout); peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this, blockToken, datanodeId); peer.setReadTimeout(dfsClientConf.socketTimeout); {code} e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, this value sometimes is small for HBase 64k block reading in a 10G network(at least, more system call) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7523) Setting a socket receive buffer size in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-7523: Attachment: HDFS-7523-001.txt Attached a small patch. similiar with the writing stuff in DFSOutputStream: {code} * Create a socket for a write pipeline * @param first the first datanode * @param length the pipeline length * @param client client * @return the socket connected to the first datanode */ static Socket createSocketForPipeline(final DatanodeInfo first, final int length, final DFSClient client) throws IOException { final String dnAddr = first.getXferAddr( client.getConf().connectToDnViaHostname); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug(Connecting to datanode + dnAddr); } final InetSocketAddress isa = NetUtils.createSocketAddr(dnAddr); final Socket sock = client.socketFactory.createSocket(); final int timeout = client.getDatanodeReadTimeout(length); NetUtils.connect(sock, isa, client.getRandomLocalInterfaceAddr(), client.getConf().socketTimeout); sock.setSoTimeout(timeout); sock.setSendBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); {code} Setting a socket receive buffer size in DFSClient - Key: HDFS-7523 URL: https://issues.apache.org/jira/browse/HDFS-7523 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-7523-001.txt It would be nice if we have a socket receive buffer size while creating socket from client(HBase) view, in old version it should be in DFSInputStream, in trunk it seems should be at: {code} @Override // RemotePeerFactory public Peer newConnectedPeer(InetSocketAddress addr, TokenBlockTokenIdentifier blockToken, DatanodeID datanodeId) throws IOException { Peer peer = null; boolean success = false; Socket sock = null; try { sock = socketFactory.createSocket(); NetUtils.connect(sock, addr, getRandomLocalInterfaceAddr(), dfsClientConf.socketTimeout); peer = TcpPeerServer.peerFromSocketAndKey(saslClient, sock, this, blockToken, datanodeId); peer.setReadTimeout(dfsClientConf.socketTimeout); {code} e.g: sock.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); the default socket buffer size in Linux+JDK7 seems is 8k if i am not wrong, this value sometimes is small for HBase 64k block reading in a 10G network(at least, more system call) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7532) dncp_block_verification.log.prev too large
Arti Wadhwani created HDFS-7532: --- Summary: dncp_block_verification.log.prev too large Key: HDFS-7532 URL: https://issues.apache.org/jira/browse/HDFS-7532 Project: Hadoop HDFS Issue Type: Bug Reporter: Arti Wadhwani Priority: Minor Hi, Using hadoop version : Hadoop 2.0.0-cdh4.7.0 can see on one datanode, dncp_block_verification.log.prev is too large. Is it safe to delete this file? {noformat} -rw-r--r-- 1 hdfs hdfs 1166438426181 Oct 31 09:34 dncp_block_verification.log.prev -rw-r--r-- 1 hdfs hdfs 138576163 Dec 15 22:16 dncp_block_verification.log.curr {noformat} This is similar to HDFS-6114 but that is for dncp_block_verification.log.curr file. Thanks, Arti Wadhwani -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned HDFS-7471: --- Assignee: Binglin Chang TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu Assignee: Binglin Chang From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang resolved HDFS-7525. - Resolution: Duplicate TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk --- Key: HDFS-7525 URL: https://issues.apache.org/jira/browse/HDFS-7525 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 Stacktrace java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)