[jira] [Commented] (HDFS-7884) NullPointerException in BlockSender
[ https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352175#comment-14352175 ] Tsz Wo Nicholas Sze commented on HDFS-7884: --- When I was working on HDFS-7746, the new test TestAppendSnapshotTruncate failed and I found NullPointerException in the log. NullPointerException in BlockSender --- Key: HDFS-7884 URL: https://issues.apache.org/jira/browse/HDFS-7884 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Blocker {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) at java.lang.Thread.run(Thread.java:745) {noformat} BlockSender.java:264 is shown below {code} this.volumeRef = datanode.data.getVolume(block).obtainReference(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7903) Cannot recover block after truncate and delete snapshot
[ https://issues.apache.org/jira/browse/HDFS-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7903: -- Attachment: (was: testMultipleTruncate.patch) Cannot recover block after truncate and delete snapshot --- Key: HDFS-7903 URL: https://issues.apache.org/jira/browse/HDFS-7903 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo Nicholas Sze Fix For: 2.7.0 Attachments: testMultipleTruncate.patch # Create a file. # Create a snapshot. # Truncate the file in the middle of a block. # Delete the snapshot. The block cannot be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7903) Cannot recover block after truncate and delete snapshot
[ https://issues.apache.org/jira/browse/HDFS-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7903: -- Attachment: testMultipleTruncate.patch Cannot recover block after truncate and delete snapshot --- Key: HDFS-7903 URL: https://issues.apache.org/jira/browse/HDFS-7903 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo Nicholas Sze Fix For: 2.7.0 Attachments: testMultipleTruncate.patch # Create a file. # Create a snapshot. # Truncate the file in the middle of a block. # Delete the snapshot. The block cannot be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7903) Cannot recover block after truncate and delete snapshot
[ https://issues.apache.org/jira/browse/HDFS-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7903: -- Attachment: testMultipleTruncate.patch testMultipleTruncate.patch: after this patch, the test fails. Cannot recover block after truncate and delete snapshot --- Key: HDFS-7903 URL: https://issues.apache.org/jira/browse/HDFS-7903 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Tsz Wo Nicholas Sze Fix For: 2.7.0 Attachments: testMultipleTruncate.patch # Create a file. # Create a snapshot. # Truncate the file in the middle of a block. # Delete the snapshot. The block cannot be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7898) Change TestAppendSnapshotTruncate to fail-fast
[ https://issues.apache.org/jira/browse/HDFS-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352234#comment-14352234 ] Hadoop QA commented on HDFS-7898: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703293/h7898_20150309.patch against trunk revision 608ebd5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9788//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9788//console This message is automatically generated. Change TestAppendSnapshotTruncate to fail-fast -- Key: HDFS-7898 URL: https://issues.apache.org/jira/browse/HDFS-7898 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7898_20150309.patch - Add a timeout to TestAppendSnapshotTruncate. - DirWorker should check if its FileWorkers have error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7894) Rolling upgrade readiness is not updated in jmx until query command is issued.
[ https://issues.apache.org/jira/browse/HDFS-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7894 started by Brahma Reddy Battula. -- Rolling upgrade readiness is not updated in jmx until query command is issued. -- Key: HDFS-7894 URL: https://issues.apache.org/jira/browse/HDFS-7894 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-7894-002.patch, HDFS-7894.patch When a hdfs rolling upgrade is started and a rollback image is created/uploaded, the active NN does not update its {{rollingUpgradeInfo}} until it receives a query command via RPC. This results in inconsistent info being showing up in the web UI and its jmx page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7460) Rewrite httpfs to use new shell framework
[ https://issues.apache.org/jira/browse/HDFS-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7460: --- Component/s: scripts Rewrite httpfs to use new shell framework - Key: HDFS-7460 URL: https://issues.apache.org/jira/browse/HDFS-7460 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: John Smith Labels: security Fix For: 3.0.0 Attachments: HDFS-7460-01.patch, HDFS-7460.patch httpfs shell code was not rewritten during HADOOP-9902. It should be modified to take advantage of the common shell framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7903) Cannot recover block after truncate and delete snapshot
Tsz Wo Nicholas Sze created HDFS-7903: - Summary: Cannot recover block after truncate and delete snapshot Key: HDFS-7903 URL: https://issues.apache.org/jira/browse/HDFS-7903 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze # Create a file. # Create a snapshot. # Truncate the file in the middle of a block. # Delete the snapshot. The block cannot be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7881) TestHftpFileSystem#testSeek fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352208#comment-14352208 ] Brahma Reddy Battula commented on HDFS-7881: After deeper look into this issue. To fix this,thinking like following 1) ByteRangeInputStream (can be handledled as part of HDFS-3671) or 2) Test code ( like HDFS-3577) Seen Earlier Jira's ( HDFS-3577,HDFS-3671 and HDFS-3788) handled by [~szetszwo],[~daryn],[~eli] and [~tucu00], do you have any inputs for this jira..? Please correct me,If I am wrong.. TestHftpFileSystem#testSeek fails in branch-2 - Key: HDFS-7881 URL: https://issues.apache.org/jira/browse/HDFS-7881 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Priority: Blocker TestHftpFileSystem#testSeek fails in branch-2. {code} --- T E S T S --- Running org.apache.hadoop.hdfs.web.TestHftpFileSystem Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.201 sec FAILURE! - in org.apache.hadoop.hdfs.web.TestHftpFileSystem testSeek(org.apache.hadoop.hdfs.web.TestHftpFileSystem) Time elapsed: 0.054 sec ERROR! java.io.IOException: Content-Length is missing: {null=[HTTP/1.1 206 Partial Content], Date=[Wed, 04 Mar 2015 05:32:30 GMT, Wed, 04 Mar 2015 05:32:30 GMT], Expires=[Wed, 04 Mar 2015 05:32:30 GMT, Wed, 04 Mar 2015 05:32:30 GMT], Connection=[close], Content-Type=[text/plain; charset=utf-8], Server=[Jetty(6.1.26)], Content-Range=[bytes 7-9/10], Pragma=[no-cache, no-cache], Cache-Control=[no-cache]} at org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:132) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:104) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:181) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.web.TestHftpFileSystem.testSeek(TestHftpFileSystem.java:253) Results : Tests in error: TestHftpFileSystem.testSeek:253 » IO Content-Length is missing: {null=[HTTP/1 Tests run: 14, Failures: 0, Errors: 1, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7894) Rolling upgrade readiness is not updated in jmx until query command is issued.
[ https://issues.apache.org/jira/browse/HDFS-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352169#comment-14352169 ] Brahma Reddy Battula commented on HDFS-7894: Thanks a lot for your pointers,,Addressed your comments..:) Rolling upgrade readiness is not updated in jmx until query command is issued. -- Key: HDFS-7894 URL: https://issues.apache.org/jira/browse/HDFS-7894 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-7894-002.patch, HDFS-7894.patch When a hdfs rolling upgrade is started and a rollback image is created/uploaded, the active NN does not update its {{rollingUpgradeInfo}} until it receives a query command via RPC. This results in inconsistent info being showing up in the web UI and its jmx page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7884) NullPointerException in BlockSender
[ https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7884: -- Attachment: org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt: NPE inside. NullPointerException in BlockSender --- Key: HDFS-7884 URL: https://issues.apache.org/jira/browse/HDFS-7884 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Blocker Attachments: org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) at java.lang.Thread.run(Thread.java:745) {noformat} BlockSender.java:264 is shown below {code} this.volumeRef = datanode.data.getVolume(block).obtainReference(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7884) NullPointerException in BlockSender
[ https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352180#comment-14352180 ] Brahma Reddy Battula commented on HDFS-7884: Thanks a lot for your input...let me look around more... NullPointerException in BlockSender --- Key: HDFS-7884 URL: https://issues.apache.org/jira/browse/HDFS-7884 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Blocker Attachments: org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:264) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) at java.lang.Thread.run(Thread.java:745) {noformat} BlockSender.java:264 is shown below {code} this.volumeRef = datanode.data.getVolume(block).obtainReference(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7894) Rolling upgrade readiness is not updated in jmx until query command is issued.
[ https://issues.apache.org/jira/browse/HDFS-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7894: --- Attachment: HDFS-7894-002.patch Rolling upgrade readiness is not updated in jmx until query command is issued. -- Key: HDFS-7894 URL: https://issues.apache.org/jira/browse/HDFS-7894 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Brahma Reddy Battula Priority: Critical Attachments: HDFS-7894-002.patch, HDFS-7894.patch When a hdfs rolling upgrade is started and a rollback image is created/uploaded, the active NN does not update its {{rollingUpgradeInfo}} until it receives a query command via RPC. This results in inconsistent info being showing up in the web UI and its jmx page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7904) NFS hard codes ShellBasedIdMapping
Allen Wittenauer created HDFS-7904: -- Summary: NFS hard codes ShellBasedIdMapping Key: HDFS-7904 URL: https://issues.apache.org/jira/browse/HDFS-7904 Project: Hadoop HDFS Issue Type: Bug Reporter: Allen Wittenauer The current NFS doesn't allow one to configure an alternative to the shell-based id mapping provider. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352357#comment-14352357 ] Allen Wittenauer commented on HDFS-7877: Isn't this effectively a dupe of HDFS-6729? Support maintenance state for datanodes --- Key: HDFS-7877 URL: https://issues.apache.org/jira/browse/HDFS-7877 Project: Hadoop HDFS Issue Type: New Feature Reporter: Ming Ma Attachments: HDFS-7877.patch, Supportmaintenancestatefordatanodes.pdf This requirement came up during the design for HDFS-7541. Given this feature is mostly independent of upgrade domain feature, it is better to track it under a separate jira. The design and draft patch will be available soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7360: --- Attachment: HDFS-7360-pnative.004.patch In new patch: 1) fix the jni library link order issue. 2) generate functiontest.xml after function test. Test libhdfs3 against MiniDFSCluster Key: HDFS-7360 URL: https://issues.apache.org/jira/browse/HDFS-7360 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Zhanwei Wang Priority: Critical Attachments: HDFS-7360-pnative.002.patch, HDFS-7360-pnative.003.patch, HDFS-7360-pnative.004.patch, HDFS-7360.patch Currently the branch has enough code to interact with HDFS servers. We should test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7904) NFS hard codes ShellBasedIdMapping
[ https://issues.apache.org/jira/browse/HDFS-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7904: --- Component/s: nfs NFS hard codes ShellBasedIdMapping -- Key: HDFS-7904 URL: https://issues.apache.org/jira/browse/HDFS-7904 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Allen Wittenauer The current NFS doesn't allow one to configure an alternative to the shell-based id mapping provider. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7904) NFS hard codes ShellBasedIdMapping
[ https://issues.apache.org/jira/browse/HDFS-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-7904: -- Assignee: Brahma Reddy Battula NFS hard codes ShellBasedIdMapping -- Key: HDFS-7904 URL: https://issues.apache.org/jira/browse/HDFS-7904 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula The current NFS doesn't allow one to configure an alternative to the shell-based id mapping provider. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage
[ https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352435#comment-14352435 ] Hui Zheng commented on HDFS-7827: - To support striped block in no-protobuf fsimage we only need to persist striped blocks in non-protobuf format. The BlockInfo class uses writable interface to persist itself in non-protobuf format. So we can override the write method to implement it. Erasure Coding: support striped blocks in non-protobuf fsimage -- Key: HDFS-7827 URL: https://issues.apache.org/jira/browse/HDFS-7827 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Hui Zheng HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. We should also add this support to the non-protobuf fsimage since it is still used for use cases like offline image processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7360: --- Attachment: (was: HDFS-7360-pnative.004.patch) Test libhdfs3 against MiniDFSCluster Key: HDFS-7360 URL: https://issues.apache.org/jira/browse/HDFS-7360 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Zhanwei Wang Priority: Critical Attachments: HDFS-7360-pnative.002.patch, HDFS-7360-pnative.003.patch, HDFS-7360-pnative.004.patch, HDFS-7360.patch Currently the branch has enough code to interact with HDFS servers. We should test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7360: --- Attachment: HDFS-7360-pnative.004.patch Test libhdfs3 against MiniDFSCluster Key: HDFS-7360 URL: https://issues.apache.org/jira/browse/HDFS-7360 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Zhanwei Wang Priority: Critical Attachments: HDFS-7360-pnative.002.patch, HDFS-7360-pnative.003.patch, HDFS-7360-pnative.004.patch, HDFS-7360.patch Currently the branch has enough code to interact with HDFS servers. We should test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7891) a block placement policy with best fault tolerance
[ https://issues.apache.org/jira/browse/HDFS-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-7891: Attachment: HDFS-7891.patch a block placement policy with best fault tolerance -- Key: HDFS-7891 URL: https://issues.apache.org/jira/browse/HDFS-7891 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-7891.patch a block placement policy tries its best to place replicas to most racks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352471#comment-14352471 ] Ming Ma commented on HDFS-7877: --- Thanks Allen for pointing out. We didn't know about HDFS-6729 at all. Let me check out the approach in that jira and we can combine the effort. Support maintenance state for datanodes --- Key: HDFS-7877 URL: https://issues.apache.org/jira/browse/HDFS-7877 Project: Hadoop HDFS Issue Type: New Feature Reporter: Ming Ma Attachments: HDFS-7877.patch, Supportmaintenancestatefordatanodes.pdf This requirement came up during the design for HDFS-7541. Given this feature is mostly independent of upgrade domain feature, it is better to track it under a separate jira. The design and draft patch will be available soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6729) Support maintenance mode for DN
[ https://issues.apache.org/jira/browse/HDFS-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352477#comment-14352477 ] Ming Ma commented on HDFS-6729: --- Eddy, thanks for the work. We didn't know about this at all until Allen pointed it out HDFS-7877. Sounds like we should combine the effort. Maybe we can step back and discuss the design. There are couple key things we want to take care of. It will be great if you can check out the design there. 1. Admin interface. Based on our admins input, it seems dfsadmin -refreshNodes might be easier to use. 2. DN state machine. We define two new states for maintenance states, ENTERING_MAINTENANCE and IN_MAINTENANCE. It takes care of the case where there are no replicas on other datanodes. It also takes care of different state transition, decomm states to maintenance states. 3. Block management. We alos enforce the read and write operations when machines are in maintenance states. Look forward to the collaboration. Support maintenance mode for DN --- Key: HDFS-6729 URL: https://issues.apache.org/jira/browse/HDFS-6729 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6729.000.patch, HDFS-6729.001.patch, HDFS-6729.002.patch, HDFS-6729.003.patch, HDFS-6729.004.patch, HDFS-6729.005.patch Some maintenance works (e.g., upgrading RAM or add disks) on DataNode only takes a short amount of time (e.g., 10 minutes). In these cases, the users do not want to report missing blocks on this DN because the DN will be online shortly without data lose. Thus, we need a maintenance mode for a DN so that maintenance work can be carried out on the DN without having to decommission it or the DN being marked as dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7854) Separate class DataStreamer out of DFSOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7854: Attachment: HDFS-7854-003.patch Patch 003 fixes one javac warning. Two findBugs warnings seem already existing before refactoring. They're new warnings because DataStreamer is separated out as a new class. Separate class DataStreamer out of DFSOutputStream -- Key: HDFS-7854 URL: https://issues.apache.org/jira/browse/HDFS-7854 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-7854-001.patch, HDFS-7854-002.patch, HDFS-7854-003.patch This sub task separate DataStreamer from DFSOutputStream. New DataStreamer will accept packets and write them to remote datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7857) Incomplete information in authentication failure WARN message caused user confusion
[ https://issues.apache.org/jira/browse/HDFS-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7857: Summary: Incomplete information in authentication failure WARN message caused user confusion (was: Incomplete information in WARN message caused user confusion) Incomplete information in authentication failure WARN message caused user confusion --- Key: HDFS-7857 URL: https://issues.apache.org/jira/browse/HDFS-7857 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yongjun Zhang Assignee: Yongjun Zhang Labels: supportability Attachments: HDFS-7857.001.patch Lots of the following messages appeared in NN log: {quote} 2014-12-10 12:18:15,728 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39838:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,728 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] .. SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39843:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,790 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] {quote} The real reason of failure is the second message about StandbyException, However, the first message is confusing because it talks about DIGEST-MD5: IO error acquiring password. Filing this jira to modify the first message to have more comprehensive information that can be obtained from {{getCauseForInvalidToken(e)}}. {code} try { saslResponse = processSaslMessage(saslMessage); } catch (IOException e) { rpcMetrics.incrAuthenticationFailures(); // attempting user could be null AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + : + attemptingUser + ( + e.getLocalizedMessage() + )); throw (IOException) getCauseForInvalidToken(e); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7857) Improve authentication failure WARN message to avoid user confusion
[ https://issues.apache.org/jira/browse/HDFS-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7857: Summary: Improve authentication failure WARN message to avoid user confusion (was: Incomplete information in authentication failure WARN message caused user confusion) Improve authentication failure WARN message to avoid user confusion --- Key: HDFS-7857 URL: https://issues.apache.org/jira/browse/HDFS-7857 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yongjun Zhang Assignee: Yongjun Zhang Labels: supportability Attachments: HDFS-7857.001.patch Lots of the following messages appeared in NN log: {quote} 2014-12-10 12:18:15,728 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39838:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,728 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] .. SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39843:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,790 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] {quote} The real reason of failure is the second message about StandbyException, However, the first message is confusing because it talks about DIGEST-MD5: IO error acquiring password. Filing this jira to modify the first message to have more comprehensive information that can be obtained from {{getCauseForInvalidToken(e)}}. {code} try { saslResponse = processSaslMessage(saslMessage); } catch (IOException e) { rpcMetrics.incrAuthenticationFailures(); // attempting user could be null AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + : + attemptingUser + ( + e.getLocalizedMessage() + )); throw (IOException) getCauseForInvalidToken(e); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7857) Improve authentication failure WARN message to avoid user confusion
[ https://issues.apache.org/jira/browse/HDFS-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352500#comment-14352500 ] Hudson commented on HDFS-7857: -- FAILURE: Integrated in Hadoop-trunk-Commit #7285 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7285/]) HDFS-7857. Improve authentication failure WARN message to avoid user confusion. Contributed by Yongjun Zhang. (yzhang: rev d799fbe1ccf8752c44f087e34b5f400591d3b5bd) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java Improve authentication failure WARN message to avoid user confusion --- Key: HDFS-7857 URL: https://issues.apache.org/jira/browse/HDFS-7857 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yongjun Zhang Assignee: Yongjun Zhang Labels: supportability Attachments: HDFS-7857.001.patch Lots of the following messages appeared in NN log: {quote} 2014-12-10 12:18:15,728 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39838:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,728 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] .. SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39843:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,790 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] {quote} The real reason of failure is the second message about StandbyException, However, the first message is confusing because it talks about DIGEST-MD5: IO error acquiring password. Filing this jira to modify the first message to have more comprehensive information that can be obtained from {{getCauseForInvalidToken(e)}}. {code} try { saslResponse = processSaslMessage(saslMessage); } catch (IOException e) { rpcMetrics.incrAuthenticationFailures(); // attempting user could be null AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + : + attemptingUser + ( + e.getLocalizedMessage() + )); throw (IOException) getCauseForInvalidToken(e); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7857) Improve authentication failure WARN message to avoid user confusion
[ https://issues.apache.org/jira/browse/HDFS-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352509#comment-14352509 ] Yongjun Zhang commented on HDFS-7857: - Thanks a lot [~jingzhao]! I tried to commit and then I realize that I had to do two things: 1, make this jira HADOOP instead of HDFS, 2, I forgot to modify the CHANGES.txt file. So I reverted the commit. My bad. Improve authentication failure WARN message to avoid user confusion --- Key: HDFS-7857 URL: https://issues.apache.org/jira/browse/HDFS-7857 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yongjun Zhang Assignee: Yongjun Zhang Labels: supportability Attachments: HDFS-7857.001.patch Lots of the following messages appeared in NN log: {quote} 2014-12-10 12:18:15,728 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39838:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,728 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] .. SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39843:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,790 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] {quote} The real reason of failure is the second message about StandbyException, However, the first message is confusing because it talks about DIGEST-MD5: IO error acquiring password. Filing this jira to modify the first message to have more comprehensive information that can be obtained from {{getCauseForInvalidToken(e)}}. {code} try { saslResponse = processSaslMessage(saslMessage); } catch (IOException e) { rpcMetrics.incrAuthenticationFailures(); // attempting user could be null AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + : + attemptingUser + ( + e.getLocalizedMessage() + )); throw (IOException) getCauseForInvalidToken(e); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7857) Improve authentication failure WARN message to avoid user confusion
[ https://issues.apache.org/jira/browse/HDFS-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352516#comment-14352516 ] Hudson commented on HDFS-7857: -- FAILURE: Integrated in Hadoop-trunk-Commit #7286 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7286/]) Revert HDFS-7857. Improve authentication failure WARN message to avoid user confusion. Contributed by Yongjun Zhang. (yzhang: rev 5578e22ce9ca6cceb510c22ec307b1869ce1a7c5) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java Improve authentication failure WARN message to avoid user confusion --- Key: HDFS-7857 URL: https://issues.apache.org/jira/browse/HDFS-7857 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yongjun Zhang Assignee: Yongjun Zhang Labels: supportability Attachments: HDFS-7857.001.patch Lots of the following messages appeared in NN log: {quote} 2014-12-10 12:18:15,728 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39838:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,728 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] .. SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ipAddress:39843:null (DIGEST-MD5: IO error acquiring password) 2014-12-10 12:18:15,790 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client ipAddress threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] {quote} The real reason of failure is the second message about StandbyException, However, the first message is confusing because it talks about DIGEST-MD5: IO error acquiring password. Filing this jira to modify the first message to have more comprehensive information that can be obtained from {{getCauseForInvalidToken(e)}}. {code} try { saslResponse = processSaslMessage(saslMessage); } catch (IOException e) { rpcMetrics.incrAuthenticationFailures(); // attempting user could be null AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + : + attemptingUser + ( + e.getLocalizedMessage() + )); throw (IOException) getCauseForInvalidToken(e); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7068) Support multiple block placement policies
[ https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-7068: Attachment: HDFS-7068.patch Support multiple block placement policies - Key: HDFS-7068 URL: https://issues.apache.org/jira/browse/HDFS-7068 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Zesheng Wu Assignee: Walter Su Attachments: HDFS-7068.patch According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default. The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well. For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy. But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS. One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7702: Attachment: (was: DATABFDT-MetadataMovingToolDesignProposal-Efforttoarealdistributednamenode-050215-1415-202.pdf) Move metadata across namenode - Effort to a real distributed namenode - Key: HDFS-7702 URL: https://issues.apache.org/jira/browse/HDFS-7702 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ray Zhang Assignee: Ray Zhang Attachments: Overflow+Table+Design+–+Record+Moved+Namespace+Locations.pdf Implement a tool can show in memory namespace tree structure with weight(size) and a API can move metadata across different namenode. The purpose is moving data efficiently and faster, without moving blocks on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7737) Implement distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7737: Attachment: Replication Of Name Nodes.pdf Dynamic Subtree Binary Partitioning Proposal.pdf Distributed Namenode.pdf Implement distributed namenode -- Key: HDFS-7737 URL: https://issues.apache.org/jira/browse/HDFS-7737 Project: Hadoop HDFS Issue Type: New Feature Reporter: Ray Zhang Assignee: Ray Zhang Priority: Minor Attachments: Distributed Namenode.pdf, Dynamic Subtree Binary Partitioning Proposal.pdf, Replication Of Name Nodes.pdf Try to add the following functions on HDFS: 1) Manually move metadata across different name nodes, without losing the original data locality 2) Automatically balance metadata across different name nodes with a novel subtree partition algorithm 3) Able to set a replication factor number for each name node, and each replicated name node can balance live read/write traffic 4) All the functionalities above are highly configurable, and the administrator can decide if he wants to use distributed name node setup or just HDFS federation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7737) Implement distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7737: Attachment: Implement Overflow Table.pdf Implement distributed namenode -- Key: HDFS-7737 URL: https://issues.apache.org/jira/browse/HDFS-7737 Project: Hadoop HDFS Issue Type: New Feature Reporter: Ray Zhang Assignee: Ray Zhang Priority: Minor Attachments: Distributed Namenode.pdf, Dynamic Subtree Binary Partitioning Proposal.pdf, Implement Overflow Table.pdf, Replication Of Name Nodes.pdf Try to add the following functions on HDFS: 1) Manually move metadata across different name nodes, without losing the original data locality 2) Automatically balance metadata across different name nodes with a novel subtree partition algorithm 3) Able to set a replication factor number for each name node, and each replicated name node can balance live read/write traffic 4) All the functionalities above are highly configurable, and the administrator can decide if he wants to use distributed name node setup or just HDFS federation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7702: Attachment: (was: Implement Overflow Table.pdf) Move metadata across namenode - Effort to a real distributed namenode - Key: HDFS-7702 URL: https://issues.apache.org/jira/browse/HDFS-7702 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ray Zhang Assignee: Ray Zhang Attachments: Metadata Moving Tool.pdf Implement a tool can show in memory namespace tree structure with weight(size) and a API can move metadata across different namenode. The purpose is moving data efficiently and faster, without moving blocks on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7068) Support multiple block placement policies
[ https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352558#comment-14352558 ] Walter Su commented on HDFS-7068: - I think putting block placement policy selection strategy into StoragePolicy is not a good idea. First, placementPolicy selection and StorageType selection are orthogonal. Second, StoragePolicy is used by block placement policy. If we use StoragePolicy to select block placement policy. These two classes are tangled. In the patch, I hard coded selection schema. Maybe we can add a property like name placemenPolicy.schema [/name] value [default]com.package.DefaultPolicy, [XOR]com.package.FaultTolarentPolicy, [RS]com.package.FaultTolarentPolicy /value Support multiple block placement policies - Key: HDFS-7068 URL: https://issues.apache.org/jira/browse/HDFS-7068 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Zesheng Wu Assignee: Walter Su Attachments: HDFS-7068.patch According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default. The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well. For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy. But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS. One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Rajan updated HDFS-5809: --- Environment: jdk1.6/java 1.7, centos6.4/debian6, 2.0.0-cdh4.5.0 (was: jdk1.6, centos6.4, 2.0.0-cdh4.5.0) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop Key: HDFS-5809 URL: https://issues.apache.org/jira/browse/HDFS-5809 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.0-alpha Environment: jdk1.6/java 1.7, centos6.4/debian6, 2.0.0-cdh4.5.0 Reporter: ikweesung Assignee: Colin Patrick McCabe Priority: Critical Labels: blockpoolslicescanner, datanode, infinite-loop Fix For: 2.6.0 Attachments: HDFS-5809.001.patch {{BlockPoolSliceScanner#scan}} contains a while loop that continues to verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other conditions like a timeout have occurred.) In order to do this, it calls {{BlockPoolSliceScanner#verifyFirstBlock}}. This is intended to grab the first block in the {{blockInfoSet}}, verify it, and remove it from that set. ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a certain bug in {{updateScanStatus}}, the block may never be removed from {{blockInfoSet}}. When this happens, we keep rescanning the exact same block until the timeout hits. The bug is triggered when a block winds up in {{blockInfoSet}} but not in {{blockMap}}. You can see it clearly in this code: {code} private synchronized void updateScanStatus(Block block, ScanType type, boolean scanOk) { BlockScanInfo info = blockMap.get(block); if ( info != null ) { delBlockInfo(info); } else { // It might already be removed. Thats ok, it will be caught next time. info = new BlockScanInfo(block); } {code} If {{info == null}}, we never call {{delBlockInfo}}, the function which is intended to remove the {{blockInfoSet}} entry. Luckily, there is a simple fix here... the variable that {{updateScanStatus}} is being passed is actually a BlockInfo object, so we can simply call {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}. This is both faster and more robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7068) Support multiple block placement policies
[ https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352568#comment-14352568 ] Walter Su commented on HDFS-7068: - part of this patch is for HDFS-7613. Maybe I should submit the patch there. Support multiple block placement policies - Key: HDFS-7068 URL: https://issues.apache.org/jira/browse/HDFS-7068 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Zesheng Wu Assignee: Walter Su Attachments: HDFS-7068.patch According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default. The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well. For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy. But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS. One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7068) Support multiple block placement policies
[ https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352567#comment-14352567 ] Walter Su commented on HDFS-7068: - part of this patch is for HDFS-7613. Maybe I should submit the patch there. Support multiple block placement policies - Key: HDFS-7068 URL: https://issues.apache.org/jira/browse/HDFS-7068 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Zesheng Wu Assignee: Walter Su Attachments: HDFS-7068.patch According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default. The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well. For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy. But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS. One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7613) Block placement policy for erasure coding groups
[ https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352571#comment-14352571 ] Walter Su commented on HDFS-7613: - This jira is implemented by HDFS-7891 and HDFS-7068 1. add a BlockPlacementPolicyFaultTolerant class in HDFS-7891 2. We can make striped files use BlockPlacementPolicyFaultTolerant in HDFS-7068 It is what I think the implementation of this jira should be. Maybe it's not good enough. I have submitted the first version of patches for above two jiras. I'm really looking forword to some suggestions. Block placement policy for erasure coding groups Key: HDFS-7613 URL: https://issues.apache.org/jira/browse/HDFS-7613 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Walter Su Blocks in an erasure coding group should be placed in different failure domains -- different DataNodes at the minimum, and different racks ideally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7702: Attachment: (was: Overflow+Table+Design+–+Record+Moved+Namespace+Locations.pdf) Move metadata across namenode - Effort to a real distributed namenode - Key: HDFS-7702 URL: https://issues.apache.org/jira/browse/HDFS-7702 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ray Zhang Assignee: Ray Zhang Attachments: Metadata Moving Tool.pdf Implement a tool can show in memory namespace tree structure with weight(size) and a API can move metadata across different namenode. The purpose is moving data efficiently and faster, without moving blocks on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7702: Attachment: Metadata Moving Tool.pdf Move metadata across namenode - Effort to a real distributed namenode - Key: HDFS-7702 URL: https://issues.apache.org/jira/browse/HDFS-7702 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ray Zhang Assignee: Ray Zhang Attachments: Metadata Moving Tool.pdf Implement a tool can show in memory namespace tree structure with weight(size) and a API can move metadata across different namenode. The purpose is moving data efficiently and faster, without moving blocks on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7702: Attachment: Implement Overflow Table.pdf Move metadata across namenode - Effort to a real distributed namenode - Key: HDFS-7702 URL: https://issues.apache.org/jira/browse/HDFS-7702 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ray Zhang Assignee: Ray Zhang Attachments: Implement Overflow Table.pdf, Metadata Moving Tool.pdf Implement a tool can show in memory namespace tree structure with weight(size) and a API can move metadata across different namenode. The purpose is moving data efficiently and faster, without moving blocks on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7702: Attachment: Impement Overflow Table.pdf Move metadata across namenode - Effort to a real distributed namenode - Key: HDFS-7702 URL: https://issues.apache.org/jira/browse/HDFS-7702 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ray Zhang Assignee: Ray Zhang Attachments: Metadata Moving Tool.pdf Implement a tool can show in memory namespace tree structure with weight(size) and a API can move metadata across different namenode. The purpose is moving data efficiently and faster, without moving blocks on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode
[ https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Zhang updated HDFS-7702: Attachment: (was: Impement Overflow Table.pdf) Move metadata across namenode - Effort to a real distributed namenode - Key: HDFS-7702 URL: https://issues.apache.org/jira/browse/HDFS-7702 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ray Zhang Assignee: Ray Zhang Attachments: Metadata Moving Tool.pdf Implement a tool can show in memory namespace tree structure with weight(size) and a API can move metadata across different namenode. The purpose is moving data efficiently and faster, without moving blocks on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-7411: Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Nicholas. I looked at a diff from v10 of the patch, and checked the new unit test and additional logic for exiting the decom loop at the node limit. I committed this. Thanks, Andrew Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.7.0 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352423#comment-14352423 ] Zhanwei Wang commented on HDFS-7360: I have setup a CI on travis-ci.org to test my patch. Build and test on Ubuntu 12.04 is OK. https://travis-ci.org/PivotalRD/libhdfs3/builds Please review the patch and give some comments. Thanks. Test libhdfs3 against MiniDFSCluster Key: HDFS-7360 URL: https://issues.apache.org/jira/browse/HDFS-7360 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Zhanwei Wang Priority: Critical Attachments: HDFS-7360-pnative.002.patch, HDFS-7360-pnative.003.patch, HDFS-7360-pnative.004.patch, HDFS-7360.patch Currently the branch has enough code to interact with HDFS servers. We should test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352424#comment-14352424 ] Hudson commented on HDFS-7411: -- FAILURE: Integrated in Hadoop-trunk-Commit #7281 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7281/]) HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.7.0 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352115#comment-14352115 ] Tsz Wo Nicholas Sze commented on HDFS-7411: --- If you believe that the new patch could keep the existing behavior, I am happy to remove my -1. Unfortunately, I won't be able to review the patch. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7769) TestHDFSCLI create files in hdfs project root dir
[ https://issues.apache.org/jira/browse/HDFS-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352119#comment-14352119 ] Tsz Wo Nicholas Sze commented on HDFS-7769: --- ... It's been enforced fairly strictly in MapReduce/YARN. ... It seems not the case. It seems that MAPREDUCE-1893 and MAPREDUCE-1124 were committed without a committer +1, and MAPREDUCE-1615 and MAPREDUCE-1454 were committed without being reviewed. The list definitely is not exhaustive. TestHDFSCLI create files in hdfs project root dir - Key: HDFS-7769 URL: https://issues.apache.org/jira/browse/HDFS-7769 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Priority: Trivial Fix For: 2.7.0 After running TestHDFSCLI, two files (data and .data.crc) remain in hdfs project root dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7898) Change TestAppendSnapshotTruncate to fail-fast
[ https://issues.apache.org/jira/browse/HDFS-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7898: -- Attachment: h7898_20150309.patch h7898_20150309.patch: adds timeout and changes DirWorker to check error. Change TestAppendSnapshotTruncate to fail-fast -- Key: HDFS-7898 URL: https://issues.apache.org/jira/browse/HDFS-7898 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7898_20150309.patch - Add a timeout to TestAppendSnapshotTruncate. - DirWorker should check if its FileWorkers have error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7898) Change TestAppendSnapshotTruncate to fail-fast
[ https://issues.apache.org/jira/browse/HDFS-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7898: -- Status: Patch Available (was: Open) Change TestAppendSnapshotTruncate to fail-fast -- Key: HDFS-7898 URL: https://issues.apache.org/jira/browse/HDFS-7898 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7898_20150309.patch - Add a timeout to TestAppendSnapshotTruncate. - DirWorker should check if its FileWorkers have error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)