[jira] [Commented] (HDFS-3995) Use DFSTestUtil.createFile() for file creation and writing in test cases
[ https://issues.apache.org/jira/browse/HDFS-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468217#comment-13468217 ] Jing Zhao commented on HDFS-3995: - The failed test cases seem unrelated: TestPersistBlocks#TestRestartDfsWithFlush -- HDFS-3811 TestNameNodeMetrics#testCorruptBlock -- HDFS-2434 TestBPOfferService#testBasicFunctionality -- HDFS-3930 TestHdfsNativeCodeLoader#testNativeCodeLoaded -- HDFS-3753 Use DFSTestUtil.createFile() for file creation and writing in test cases Key: HDFS-3995 URL: https://issues.apache.org/jira/browse/HDFS-3995 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-3995.trunk.001.patch Currently there are many tests that define and use their own methods to create file and write some number of blocks in MiniDfsCluster. These methods can be consolidated to DFSTestUtil.createFile(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-2434: Attachment: HDFS-2434.001.patch Based on Kihwal's analysis, can we solve the problem on the CorruptBlocks metric by disabling the heartbeats of datanodes before marking the block as corrupt? TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Uma Maheswara Rao G Attachments: HDFS-2434.001.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-2434: Attachment: HDFS-2434.002.patch TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Uma Maheswara Rao G Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors
[ https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3920: Attachment: HDFS-3920-005.patch Updated the cleanup patch based on committed changes in HDFS-3916. libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors --- Key: HDFS-3920 URL: https://issues.apache.org/jira/browse/HDFS-3920 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, HDFS-3920-005.patch 1. Clean up code for string processing; 2. Using strerror consistently for error handling; 3. Use sprintf to replace decToOctal 4. other issues requiring fixing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors
[ https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469968#comment-13469968 ] Jing Zhao commented on HDFS-3920: - Colin and Andy, thank you so much for the review and comments! I'm addressing the comments now and will post updated patch later. libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors --- Key: HDFS-3920 URL: https://issues.apache.org/jira/browse/HDFS-3920 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, HDFS-3920-005.patch 1. Clean up code for string processing; 2. Using strerror consistently for error handling; 3. Use sprintf to replace decToOctal 4. other issues requiring fixing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3912: Assignee: Jing Zhao (was: nkeywal) Affects Version/s: 3.0.0 Status: Patch Available (was: Open) Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3912: Attachment: HDFS-3912.006.patch Upload the patch with minor updates. Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: nkeywal Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3912: Attachment: HDFS-3912.007.patch The DataNode#heartbeatsDisabledForTests should be declared as volatile, and for new test cases in TestReplicaitonPolicy, instead of waiting, I explicitly call the heartbeatCheck() method. Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch, HDFS-3912.007.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3912: Attachment: HDFS-3912.008.patch Addressed Nicolas's comments. Now we check if the stale interval is positive instead of the original warning msg. Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3912: Attachment: HDFS-3912.009.patch Updated based on Suresh's comments. Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, HDFS-3912.009.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors
[ https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3920: Attachment: HDFS-3920-006.patch Uploaded the patch addressing Colin and Andy's comments. libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors --- Key: HDFS-3920 URL: https://issues.apache.org/jira/browse/HDFS-3920 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, HDFS-3920-005.patch, HDFS-3920-006.patch 1. Clean up code for string processing; 2. Using strerror consistently for error handling; 3. Use sprintf to replace decToOctal 4. other issues requiring fixing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473459#comment-13473459 ] Jing Zhao commented on HDFS-3912: - Hi Nicolas, I will work on the branch 1.1 patch. Hopefully I can upload the patch today or tomorrow. Thanks, -Jing Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, HDFS-3912.009.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3912: Attachment: HDFS-3912-010.patch HDFS-3912-branch-1.1-001.patch Patch for branch 1.1. Also did some cleanup for the test code in the patch for trunk. Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473771#comment-13473771 ] Jing Zhao commented on HDFS-3912: - For the 1.1 patch, I've run local tests and all the testcases passed. Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4036) FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException
[ https://issues.apache.org/jira/browse/HDFS-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4036: Attachment: HDFS-4036-trunk.001.patch Patch uploaded. FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException --- Key: HDFS-4036 URL: https://issues.apache.org/jira/browse/HDFS-4036 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Jing Zhao Attachments: HDFS-4036-trunk.001.patch The code in FSDirectory.unprotectedAddFile(..) does not throw UnresolvedLinkException, we should remove throws UnresolvedLinkException from the declaration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4036) FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException
[ https://issues.apache.org/jira/browse/HDFS-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4036: Affects Version/s: 3.0.0 Status: Patch Available (was: Open) FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException --- Key: HDFS-4036 URL: https://issues.apache.org/jira/browse/HDFS-4036 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Jing Zhao Attachments: HDFS-4036-trunk.001.patch The code in FSDirectory.unprotectedAddFile(..) does not throw UnresolvedLinkException, we should remove throws UnresolvedLinkException from the declaration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4052) FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug information outside of the namesystem lock
Jing Zhao created HDFS-4052: --- Summary: FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug information outside of the namesystem lock Key: HDFS-4052 URL: https://issues.apache.org/jira/browse/HDFS-4052 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.2.0, 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Currently in branch-1, both FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock print debug information (which can be a long msg generated by traversing a list/array) without releasing the FSNameSystem lock. It would be better to move them outside of the namesystem lock. This may also apply to FSNameSystem#invalidateWorkForOneNode in trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4052) FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug information outside of the namesystem lock
[ https://issues.apache.org/jira/browse/HDFS-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4052: Attachment: HDFS-4052.trunk.001.patch HDFS-4052.b1.001.patch Patch uploaded for branch-1 and trunk. FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug information outside of the namesystem lock --- Key: HDFS-4052 URL: https://issues.apache.org/jira/browse/HDFS-4052 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.2.0, 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4052.b1.001.patch, HDFS-4052.trunk.001.patch Currently in branch-1, both FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock print debug information (which can be a long msg generated by traversing a list/array) without releasing the FSNameSystem lock. It would be better to move them outside of the namesystem lock. This may also apply to FSNameSystem#invalidateWorkForOneNode in trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4059) Add number of stale DataNodes to metrics
Jing Zhao created HDFS-4059: --- Summary: Add number of stale DataNodes to metrics Key: HDFS-4059 URL: https://issues.apache.org/jira/browse/HDFS-4059 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Add the number of stale DataNodes to metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4059) Add number of stale DataNodes to metrics
[ https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4059: Attachment: HDFS-4059.trunk.001.patch The patch includes a new testcase (testStaleNodes) for TestNameNodeMetrics. But on my local test TestNameNodeMetrics#testCorruptBlock may still fail. After applying the patch in HDFS-2434 my local test works fine. Add number of stale DataNodes to metrics Key: HDFS-4059 URL: https://issues.apache.org/jira/browse/HDFS-4059 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 1.1.0, 3.0.0, 2.0.3-alpha Attachments: HDFS-4059.trunk.001.patch Add the number of stale DataNodes to metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock
Jing Zhao created HDFS-4062: --- Summary: In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock Key: HDFS-4062 URL: https://issues.apache.org/jira/browse/HDFS-4062 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long log info level information outside of the namesystem lock. We create this separate jira since the description and code is different for 1.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock
[ https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4062: Attachment: HDFS-4062.b1.001.patch Patch uploaded. In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock --- Key: HDFS-4062 URL: https://issues.apache.org/jira/browse/HDFS-4062 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4062.b1.001.patch Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long log info level information outside of the namesystem lock. We create this separate jira since the description and code is different for 1.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-2434: --- Assignee: Jing Zhao TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4059) Add number of stale DataNodes to metrics
[ https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4059: Attachment: HDFS-4059.trunk.002.patch Thanks for the comments Suresh! The testStaleNodes testcase now explicitly calls the heartbeatCheck method through a new method in BlockManagerTestUtil, so that we can remove the Thread.sleep(). Add number of stale DataNodes to metrics Key: HDFS-4059 URL: https://issues.apache.org/jira/browse/HDFS-4059 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 1.1.0, 3.0.0, 2.0.3-alpha Attachments: HDFS-4059.trunk.001.patch, HDFS-4059.trunk.002.patch Add the number of stale DataNodes to metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4059) Add number of stale DataNodes to metrics
[ https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4059: Attachment: HDFS-4059.trunk.003.patch Updated based on Suresh's comments. Add number of stale DataNodes to metrics Key: HDFS-4059 URL: https://issues.apache.org/jira/browse/HDFS-4059 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 1.1.0, 3.0.0, 2.0.3-alpha Attachments: HDFS-4059.trunk.001.patch, HDFS-4059.trunk.002.patch, HDFS-4059.trunk.003.patch Add the number of stale DataNodes to metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-2434: Attachment: HDFS-2434.trunk.003.patch Made some further changes for the patch. In the testCorrupt testcase, because currently the delete operation will not remove the pending record in NN, it is possible that before the DN sends back a block has been received msg to NN, the block has been deleted due to the deletion request. In that case, it seems that the pending record cannot be removed until timeout. Thus the new patch first waits for the recovery to finish, and then do the deletion. TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, HDFS-2434.trunk.003.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing
[ https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3912: Attachment: HDFS-3912-branch-1.patch The patch for branch-1. Detecting and avoiding stale datanodes for writing -- Key: HDFS-3912 URL: https://issues.apache.org/jira/browse/HDFS-3912 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, HDFS-3912-branch-1.patch 1. Make stale timeout adaptive to the number of nodes marked stale in the cluster. 2. Consider having a separate configuration for write skipping the stale nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors
[ https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3920: Attachment: HDFS-3920-007.patch Colin, thanks for the comments! I've addressed most of your comments and will file another jira to fix the compile warnings (some of them are generated when compiling test code, which will be addressed in HDFS-3923). libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors --- Key: HDFS-3920 URL: https://issues.apache.org/jira/browse/HDFS-3920 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, HDFS-3920-005.patch, HDFS-3920-006.patch, HDFS-3920-007.patch 1. Clean up code for string processing; 2. Using strerror consistently for error handling; 3. Use sprintf to replace decToOctal 4. other issues requiring fixing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-2434: Attachment: HDFS-2434.trunk.004.patch The 003 patch could not apply to trunk after the changes in HDFS-4059. Modify the patch to be consistent. TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-2434: Affects Version/s: 3.0.0 Status: Patch Available (was: Reopened) TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4059) Add number of stale DataNodes to metrics
[ https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478076#comment-13478076 ] Jing Zhao commented on HDFS-4059: - I backport the patch to branch-1. Since the code is different, I will create another jira for that. Add number of stale DataNodes to metrics Key: HDFS-4059 URL: https://issues.apache.org/jira/browse/HDFS-4059 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0, 2.0.3-alpha Attachments: HDFS-4059.trunk.001.patch, HDFS-4059.trunk.002.patch, HDFS-4059.trunk.003.patch Add the number of stale DataNodes to metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4071) Add number of stale DataNodes to metrics for Branch-1
Jing Zhao created HDFS-4071: --- Summary: Add number of stale DataNodes to metrics for Branch-1 Key: HDFS-4071 URL: https://issues.apache.org/jira/browse/HDFS-4071 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Backport HDFS-4059 to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4071) Add number of stale DataNodes to metrics for Branch-1
[ https://issues.apache.org/jira/browse/HDFS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4071: Attachment: HDFS-4059-backport-branch-1.001.patch To avoid bringing extra complexity to TestNameNodeMetrics when changing the number of stale nodes in MiniDFSCluster test, I put the test in TestReplicationPolicy. Add number of stale DataNodes to metrics for Branch-1 - Key: HDFS-4071 URL: https://issues.apache.org/jira/browse/HDFS-4071 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Affects Versions: 1.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 1.1.0, 3.0.0, 2.0.3-alpha Attachments: HDFS-4059-backport-branch-1.001.patch Backport HDFS-4059 to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3953) DFSOutputStream constructor does not use bufferSize parameter
[ https://issues.apache.org/jira/browse/HDFS-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-3953. - Resolution: Duplicate Duplicated with HDFS-4070 DFSOutputStream constructor does not use bufferSize parameter - Key: HDFS-3953 URL: https://issues.apache.org/jira/browse/HDFS-3953 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao DFSOutputStream constructor does not use bufferSize parameter. However, a buffer size is always passed in many other methods defined in DFSClient, DistributedFileSystem, and Hdfs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
Jing Zhao created HDFS-4072: --- Summary: When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4072: Attachment: TestPendingAndDelete.java The attached test may generate the scenario where a pendingReplication record is left in BlockManager#pendingReplications until timeout. When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications - Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4072: Attachment: HDFS-4072.trunk.001.patch And a simple patch uploaded. When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications - Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4072.trunk.001.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4072: Status: Patch Available (was: Open) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications - Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4072.trunk.001.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4072: Attachment: HDFS-4072.trunk.002.patch Thanks for the comments Suresh. The modified patch uploaded. When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications - Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4073) Two minor improvements to FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4073: Attachment: HDFS-4073.trunk.001.patch Patch uploaded. Two minor improvements to FSDirectory - Key: HDFS-4073 URL: https://issues.apache.org/jira/browse/HDFS-4073 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 3.0.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4073.trunk.001.patch - Add a debug log message to FSDirectory.unprotectedAddFile(..) for the caught IOException. - Remove throw UnresolvedLinkException from addToParent(..). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4073) Two minor improvements to FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4073: Affects Version/s: 3.0.0 Status: Patch Available (was: Open) Two minor improvements to FSDirectory - Key: HDFS-4073 URL: https://issues.apache.org/jira/browse/HDFS-4073 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 3.0.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4073.trunk.001.patch - Add a debug log message to FSDirectory.unprotectedAddFile(..) for the caught IOException. - Remove throw UnresolvedLinkException from addToParent(..). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478640#comment-13478640 ] Jing Zhao commented on HDFS-4072: - Thanks for the comment Eli! I think you're right: the PendingReplicationBlocks#remove only decrements the pending replication number by 1, it's not removing the whole record. So I guess we only need to remove the whole record about the block from PendingReplicationBlocks here, and we can still do this operation in BlockManager#removeBlock(). When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications - Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4072: Attachment: HDFS-4072.trunk.003.patch Updated patch. When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications - Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, HDFS-4072.trunk.003.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4072: Attachment: HDFS-4072.trunk.004.patch Eli, thanks for the advice. To address your comments, I made two replicas corrupt and checked if the pending replica size is 2 in the new testcase. When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications - Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, HDFS-4072.trunk.003.patch, HDFS-4072.trunk.004.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock
[ https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479455#comment-13479455 ] Jing Zhao commented on HDFS-4062: - test-patch output: -1 overall. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 222 new Findbugs (version 2.0.1) warnings. In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock --- Key: HDFS-4062 URL: https://issues.apache.org/jira/browse/HDFS-4062 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4062.b1.001.patch Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long log info level information outside of the namesystem lock. We create this separate jira since the description and code is different for 1.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4072) On file deletion remove corresponding blocks pending replication
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4072: Attachment: HDFS-4072.b1.001.patch Branch-1 patch. Will run test-patch for it. On file deletion remove corresponding blocks pending replication Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0, 2.0.3-alpha Attachments: HDFS-4072.b1.001.patch, HDFS-4072.patch, HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, HDFS-4072.trunk.003.patch, HDFS-4072.trunk.004.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-2434: Attachment: HDFS-2434.trunk.005.patch Update the patch based on the change in HDFS-4072. TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch, HDFS-2434.trunk.005.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4072) On file deletion remove corresponding blocks pending replication
[ https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480427#comment-13480427 ] Jing Zhao commented on HDFS-4072: - test-patch result for branch-1 patch: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 222 new Findbugs (version 2.0.1) warnings. On file deletion remove corresponding blocks pending replication Key: HDFS-4072 URL: https://issues.apache.org/jira/browse/HDFS-4072 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Fix For: 3.0.0, 2.0.3-alpha Attachments: HDFS-4072.b1.001.patch, HDFS-4072.patch, HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, HDFS-4072.trunk.003.patch, HDFS-4072.trunk.004.patch, TestPendingAndDelete.java Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4093) In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3.
Jing Zhao created HDFS-4093: --- Summary: In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3. Key: HDFS-4093 URL: https://issues.apache.org/jira/browse/HDFS-4093 Project: Hadoop HDFS Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao In branch-1-win, when AzureBlockPlacementPolicy (which extends the BlockPlacementPolicyDefault) is used, if the client increases the number of replicas (e.g., from 3 to 10), AzureBlockPlacementPolicy#chooseTarget will return only 1 Datanode each time. Thus in FSNameSystem#computeReplicationWorkForBlock, it is possible that the replication monitor may choose a datanode that has been chosen as target but still in the pendingReplications (because computeReplicationWorkForBlock does not check the pending replication before doing the chooseTarget). To avoid this hit-the-same-datanode scenario, we modify the AzureBlockPlacementPolicy#chooseTarget to make it return multiple DN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4093) In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3.
[ https://issues.apache.org/jira/browse/HDFS-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4093: Attachment: HDFS-b1-win-4093.001.patch Patch uploaded. In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3. --- Key: HDFS-4093 URL: https://issues.apache.org/jira/browse/HDFS-4093 Project: Hadoop HDFS Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-b1-win-4093.001.patch In branch-1-win, when AzureBlockPlacementPolicy (which extends the BlockPlacementPolicyDefault) is used, if the client increases the number of replicas (e.g., from 3 to 10), AzureBlockPlacementPolicy#chooseTarget will return only 1 Datanode each time. Thus in FSNameSystem#computeReplicationWorkForBlock, it is possible that the replication monitor may choose a datanode that has been chosen as target but still in the pendingReplications (because computeReplicationWorkForBlock does not check the pending replication before doing the chooseTarget). To avoid this hit-the-same-datanode scenario, we modify the AzureBlockPlacementPolicy#chooseTarget to make it return multiple DN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4061) TestBalancer and TestUnderReplicatedBlocks need timeouts
[ https://issues.apache.org/jira/browse/HDFS-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480523#comment-13480523 ] Jing Zhao commented on HDFS-4061: - Nicholas, I checked the test output and guess maybe the test failure is caused by this: When the NameNode invalides a block for a datanode D1 and remove the datanode-block pair from the blockMap, and before the invalidation request is sent to the datanode D1, the BlockManager#computeDataNodeWork also starts to work and schedule the replication to D1. So the invalidation and replication request will be sent to D1 at the same time. D1 will then ignore the replication request (also throws a ReplicaAlreadyExistsException), and delete the replica. Thus NN cannot receive the blockreceived msg from D1. And the testcast will timeout in 5min which is smaller than the timeout of PendingReplication request (usually 5~10 min). I can file another jira to fix the testcase if you think it is correct. TestBalancer and TestUnderReplicatedBlocks need timeouts Key: HDFS-4061 URL: https://issues.apache.org/jira/browse/HDFS-4061 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.0.3-alpha Attachments: hdfs-4061.txt Saw TestBalancer and TestUnderReplicatedBlocks timeout hard on a jenkins job recently, let's annotate the relevant tests with timeouts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-4067: --- Assignee: Jing Zhao TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException --- Key: HDFS-4067 URL: https://issues.apache.org/jira/browse/HDFS-4067 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Labels: test-fail After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException: {noformat} org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480529#comment-13480529 ] Jing Zhao commented on HDFS-4067: - Move the discussion from HDFS-4061 here: When the NameNode invalides a block for a datanode D1 and remove the datanode-block pair from the blockMap, and before the invalidation request is sent to the datanode D1, the BlockManager#computeDataNodeWork also starts to work and schedule the replication to D1. So the invalidation and replication request will be sent to D1 at the same time. D1 will then ignore the replication request (also throws a ReplicaAlreadyExistsException), and delete the replica. Thus NN cannot receive the blockreceived msg from D1. And the testcast will timeout in 5min which is smaller than the timeout of PendingReplication request (usually 5~10 min). TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException --- Key: HDFS-4067 URL: https://issues.apache.org/jira/browse/HDFS-4067 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Labels: test-fail After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException: {noformat} org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480535#comment-13480535 ] Jing Zhao commented on HDFS-4067: - And I guess that's also the reason for HDFS-342? Since the initial replication request is ignored, the replication on D1 can only be done after the pending replication timeout. TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException --- Key: HDFS-4067 URL: https://issues.apache.org/jira/browse/HDFS-4067 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Labels: test-fail After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException: {noformat} org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4067: Attachment: HDFS-4067.trunk.001.patch Initial patch to fix. TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException --- Key: HDFS-4067 URL: https://issues.apache.org/jira/browse/HDFS-4067 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-4067.trunk.001.patch After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException: {noformat} org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4095) Add snapshot related metrics
Jing Zhao created HDFS-4095: --- Summary: Add snapshot related metrics Key: HDFS-4095 URL: https://issues.apache.org/jira/browse/HDFS-4095 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Add metrics for number of snapshots in the system, including 1) number of snapshot files, and 2) number of snapshot only files (snapshot file that are not deleted but the original file is already deleted). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4096) Add snapshot information to namenode WebUI
Jing Zhao created HDFS-4096: --- Summary: Add snapshot information to namenode WebUI Key: HDFS-4096 URL: https://issues.apache.org/jira/browse/HDFS-4096 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Add snapshot information to namenode WebUI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4095) Add snapshot related metrics
[ https://issues.apache.org/jira/browse/HDFS-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4095: Attachment: HDFS-4095.001.patch Initial patch defining a group of snapshot-related metrics. Add snapshot related metrics Key: HDFS-4095 URL: https://issues.apache.org/jira/browse/HDFS-4095 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4095.001.patch Add metrics for number of snapshots in the system, including 1) number of snapshot files, and 2) number of snapshot only files (snapshot file that are not deleted but the original file is already deleted). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4096) Add snapshot information to namenode WebUI
[ https://issues.apache.org/jira/browse/HDFS-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4096: Attachment: HDFS-4096.relative.001.patch Initial patch that only adds snapshot-related stats summary to NN WebUI. Add snapshot information to namenode WebUI -- Key: HDFS-4096 URL: https://issues.apache.org/jira/browse/HDFS-4096 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4096.relative.001.patch Add snapshot information to namenode WebUI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4106: Attachment: HDFS-4106-trunk.001.patch BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile -- Key: HDFS-4106 URL: https://issues.apache.org/jira/browse/HDFS-4106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4106-trunk.001.patch All these variables may be assigned/read by a testing thread (through BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus they should be declared as volatile to make sure the happens-before consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
Jing Zhao created HDFS-4106: --- Summary: BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile Key: HDFS-4106 URL: https://issues.apache.org/jira/browse/HDFS-4106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4106-trunk.001.patch All these variables may be assigned/read by a testing thread (through BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus they should be declared as volatile to make sure the happens-before consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4106: Status: Patch Available (was: Open) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile -- Key: HDFS-4106 URL: https://issues.apache.org/jira/browse/HDFS-4106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4106-trunk.001.patch All these variables may be assigned/read by a testing thread (through BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus they should be declared as volatile to make sure the happens-before consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4067: Status: Patch Available (was: Open) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException --- Key: HDFS-4067 URL: https://issues.apache.org/jira/browse/HDFS-4067 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-4067.trunk.001.patch After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException: {noformat} org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482105#comment-13482105 ] Jing Zhao commented on HDFS-3616: - Also got this exception in HDFS-4106. Seems like the exception happens because a thread is iterating the hashmap bpSlices (FsVolumeImpl#shutdown) while another thread is remove entries from the same hashMap (FsVolumeImpl#shutdownBlockPool). A quick fix can be changing bpSlices from a HashMap to a ConcurrentHashMap. TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown -- Key: HDFS-3616 URL: https://issues.apache.org/jira/browse/HDFS-3616 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Uma Maheswara Rao G I have seen this in precommit build #2743 {noformat} java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$EntryIterator.next(HashMap.java:834) at java.util.HashMap$EntryIterator.next(HashMap.java:832) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-3616: --- Assignee: Jing Zhao TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown -- Key: HDFS-3616 URL: https://issues.apache.org/jira/browse/HDFS-3616 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao I have seen this in precommit build #2743 {noformat} java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$EntryIterator.next(HashMap.java:834) at java.util.HashMap$EntryIterator.next(HashMap.java:832) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482108#comment-13482108 ] Jing Zhao commented on HDFS-4106: - Failing testcases are related to HDFS-3616 (TestWebHdfsWithMultipleNameNodes) and HDFS-4067 (TestUnderReplicatedBlocks). BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile -- Key: HDFS-4106 URL: https://issues.apache.org/jira/browse/HDFS-4106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4106-trunk.001.patch All these variables may be assigned/read by a testing thread (through BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus they should be declared as volatile to make sure the happens-before consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482131#comment-13482131 ] Jing Zhao commented on HDFS-4067: - testcase failure reported in HDFS-3948 before. Will run TestUnderReplicatedBlocks in loop later. TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException --- Key: HDFS-4067 URL: https://issues.apache.org/jira/browse/HDFS-4067 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-4067.trunk.001.patch After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException: {noformat} org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482461#comment-13482461 ] Jing Zhao commented on HDFS-2434: - Have run the testcase 551 times locally and all of them passed. TestNameNodeMetrics.testCorruptBlock fails intermittently - Key: HDFS-2434 URL: https://issues.apache.org/jira/browse/HDFS-2434 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch, HDFS-2434.trunk.005.patch java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3616: Attachment: HDFS-3616.trunk.001.patch After checking the code, I guess the exception is caused by this process: 1. In DataNode#shutdown(), DataNode#shouldRun is set to false. 2. BPServiceActor#run() stops running, and runs BPServiceActor#cleanUp(). 3. While executing BPServiceActor#cleanUp(), DataNode#shutdownBlockPool() is called, where blockPoolManager.remove(bpos) is executed before this.blockPoolManager.shutDownAll(); is called in DataNode#shutdown(). Thus the corresponding BPOfferService cannot be seen and shutdown by blockPoolManager#shutDownAll() since it has been removed from BlockPoolManager#offerServices. 4. The actor thread continues running DataNode#shutdownBlockPool() which will finally tries to remove record from FsVolumeImpl#bpSlices, while the DataNode shutdown thread runs into FsVolumeImpl#shutdown() which iterates the bpSlices. Thus the ConcurrentModificationException may be thrown. So to avoid changing other code, maybe we can simply change bpSlices from HashMap to ConcurrentHashMap? A simple patch based on this is attached. TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown -- Key: HDFS-3616 URL: https://issues.apache.org/jira/browse/HDFS-3616 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Attachments: HDFS-3616.trunk.001.patch I have seen this in precommit build #2743 {noformat} java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$EntryIterator.next(HashMap.java:834) at java.util.HashMap$EntryIterator.next(HashMap.java:832) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482772#comment-13482772 ] Jing Zhao commented on HDFS-4106: - Thanks for the comments Brandon! So the cost of a volatile read/write may be an extra memory access. For a BPServiceActor thread which communicate with NN periodically, I think this may not cause a performance problem (also considering variables like lastHeartbeat are not accessed a lot). While without the volatile keyword it is possible that the triggerHeartbeatForTests cannot trigger the heartbeat as it intends to, since the change of lastheartbeat may not be seen by the actor thread. Also the testing thread may be waiting for an unknown period of time because the change of lastheartbeat by the actor thread may not be seen by the testing thread. BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile -- Key: HDFS-4106 URL: https://issues.apache.org/jira/browse/HDFS-4106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4106-trunk.001.patch All these variables may be assigned/read by a testing thread (through BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus they should be declared as volatile to make sure the happens-before consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4093) In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3.
[ https://issues.apache.org/jira/browse/HDFS-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4093: Attachment: HDFS-b1-win-4093.002.patch Updated the patch. Have passed local testcases. In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3. --- Key: HDFS-4093 URL: https://issues.apache.org/jira/browse/HDFS-4093 Project: Hadoop HDFS Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-b1-win-4093.001.patch, HDFS-b1-win-4093.002.patch In branch-1-win, when AzureBlockPlacementPolicy (which extends the BlockPlacementPolicyDefault) is used, if the client increases the number of replicas (e.g., from 3 to 10), AzureBlockPlacementPolicy#chooseTarget will return only 1 Datanode each time. Thus in FSNameSystem#computeReplicationWorkForBlock, it is possible that the replication monitor may choose a datanode that has been chosen as target but still in the pendingReplications (because computeReplicationWorkForBlock does not check the pending replication before doing the chooseTarget). To avoid this hit-the-same-datanode scenario, we modify the AzureBlockPlacementPolicy#chooseTarget to make it return multiple DN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4106: Attachment: HDFS-4106-trunk.002.patch Updated based on Brandon's comments. BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile -- Key: HDFS-4106 URL: https://issues.apache.org/jira/browse/HDFS-4106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4106-trunk.001.patch, HDFS-4106-trunk.002.patch All these variables may be assigned/read by a testing thread (through BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus they should be declared as volatile to make sure the happens-before consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3616: Attachment: HDFS-3616.trunk.002.patch After discussing with Nicholas, we think to avoid the concurrentModificationException, we only need to keep a copy of BlockPoolManager#offerServices before we set DataNode#shouldRun to false. In that case, blockPoolManager#shutDownAll() can access and shutdown all the actor threads thus no concurrent access of the bpSlices will happen anymore. Uploaded a patch based on this. TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown -- Key: HDFS-3616 URL: https://issues.apache.org/jira/browse/HDFS-3616 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Attachments: HDFS-3616.trunk.001.patch, HDFS-3616.trunk.002.patch I have seen this in precommit build #2743 {noformat} java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$EntryIterator.next(HashMap.java:834) at java.util.HashMap$EntryIterator.next(HashMap.java:832) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3616: Status: Patch Available (was: Open) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown -- Key: HDFS-3616 URL: https://issues.apache.org/jira/browse/HDFS-3616 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Attachments: HDFS-3616.trunk.001.patch, HDFS-3616.trunk.002.patch I have seen this in precommit build #2743 {noformat} java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$EntryIterator.next(HashMap.java:834) at java.util.HashMap$EntryIterator.next(HashMap.java:832) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3616: Attachment: HDFS-3616.trunk.003.patch Need to check if blockPoolManager is null before call getAllNamenodeThreads(). TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown -- Key: HDFS-3616 URL: https://issues.apache.org/jira/browse/HDFS-3616 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Attachments: HDFS-3616.trunk.001.patch, HDFS-3616.trunk.002.patch, HDFS-3616.trunk.003.patch I have seen this in precommit build #2743 {noformat} java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$EntryIterator.next(HashMap.java:834) at java.util.HashMap$EntryIterator.next(HashMap.java:832) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483395#comment-13483395 ] Jing Zhao commented on HDFS-4067: - Run the testcase ~800 times and all of them passed. TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException --- Key: HDFS-4067 URL: https://issues.apache.org/jira/browse/HDFS-4067 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Labels: test-fail Attachments: HDFS-4067.trunk.001.patch After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException: {noformat} org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy
[ https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3948: Attachment: HDFS-3948-regenerate-exception.patch Also got this exception in HDFS-3616 and HDFS-4067. After checking the code, I guess this exception may be caused because of this process: 1. A FSDataOutputStream instance (out4) is created through WebHdfsFileSystem#create, in order to create and write a new file. 2. The request is redirected to a DN, where DFSClient#create is called to create the file in NN through RPC. 3. At this time, the test has called MiniDfsCluster#shutdownNameNode, and in NameNode#stop(), the FSNamesystem has been shutdown (where the FSEditLog will be close) but the RPCServer has not been closed yet. 4. The RPC request from DN is sent to NN and FSEditLog#logEdit is called for the creation. But at this time the FSEditLog has already been closed and FSEditLog#editLogStream has been set to null. Therefore, if the assertion is enabled, a bad state: CLOSED will be returned to client finally (the case in HDFS-3948); if the assertion is not enabled, because FSEditLog#editLogStream has been set to null, a NPE will be returned as reported in HDFS-3822. The attached patch can regenerate the exception. TestWebHDFS#testNamenodeRestart is racy Key: HDFS-3948 URL: https://issues.apache.org/jira/browse/HDFS-3948 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Attachments: HDFS-3948-regenerate-exception.patch After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close is racing with restart and resulting in an add block after the edit log is closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy
[ https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-3948: --- Assignee: Jing Zhao TestWebHDFS#testNamenodeRestart is racy Key: HDFS-3948 URL: https://issues.apache.org/jira/browse/HDFS-3948 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Attachments: HDFS-3948-regenerate-exception.patch After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close is racing with restart and resulting in an add block after the edit log is closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy
[ https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483646#comment-13483646 ] Jing Zhao commented on HDFS-3948: - Correction: HDFS-3822's NPE should be caused by BlockManager race, as Eli commented. TestWebHDFS#testNamenodeRestart is racy Key: HDFS-3948 URL: https://issues.apache.org/jira/browse/HDFS-3948 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Attachments: HDFS-3948-regenerate-exception.patch After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close is racing with restart and resulting in an add block after the edit log is closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy
[ https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3948: Attachment: HDFS-3948-regenerate-exception.002.patch The prior patch actually generates the exception while NN executing FSNameSystem#startFile. The new patch can generate the same exception while NN executing FSNamesystem#allocateBlock (the same with the one reported in HDFS-3948). TestWebHDFS#testNamenodeRestart is racy Key: HDFS-3948 URL: https://issues.apache.org/jira/browse/HDFS-3948 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Attachments: HDFS-3948-regenerate-exception.002.patch, HDFS-3948-regenerate-exception.patch After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close is racing with restart and resulting in an add block after the edit log is closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy
[ https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3948: Attachment: HDFS-3948.001.patch Initial patch to fix the testcase. Because webhdfs does not support hflush, it is difficult to avoid the race between webhdfs@DN's writing and NN's shutdown. Thus in this patch, I close the corresponding FSDataOutputStream (instead of calling its hflush) when the testcase is run for webhdfs. TestWebHDFS#testNamenodeRestart is racy Key: HDFS-3948 URL: https://issues.apache.org/jira/browse/HDFS-3948 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jing Zhao Attachments: HDFS-3948.001.patch, HDFS-3948-regenerate-exception.002.patch, HDFS-3948-regenerate-exception.patch After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close is racing with restart and resulting in an add block after the edit log is closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4115) TestHDFSCLI.testAll fails one test due to number format
[ https://issues.apache.org/jira/browse/HDFS-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485281#comment-13485281 ] Jing Zhao commented on HDFS-4115: - The patch looks good. +1 for the patch. TestHDFSCLI.testAll fails one test due to number format --- Key: HDFS-4115 URL: https://issues.apache.org/jira/browse/HDFS-4115 Project: Hadoop HDFS Issue Type: Bug Components: test Environment: Apache Maven 3.0.4 Maven home: /usr/share/maven Java version: 1.6.0_35, vendor: Sun Microsystems Inc. Java home: /usr/lib/jvm/j2sdk1.6-oracle/jre Default locale: en_US, platform encoding: ISO-8859-1 OS name: linux, version: 3.2.0-32-generic, arch: amd64, family: unix Reporter: Trevor Robinson Assignee: Trevor Robinson Attachments: HDFS-4115.patch This test fails repeatedly on only one of my machines: {noformat} Failed tests: testAll(org.apache.hadoop.cli.TestHDFSCLI): One of the tests failed. See the Detailed results to identify the command that failed Test ID: [587] Test Description: [report: Displays the report about the Datanodes] Test Commands: [-fs hdfs://localhost:35254 -report] Comparator: [RegexpComparator] Comparision result: [fail] Expected output: [Configured Capacity: [0-9]+ \([0-9]+\.[0-9]+ [BKMGT]+\)] Actual output: [Configured Capacity: 472446337024 (440 GB) {noformat} The problem appears to be that {{StringUtils.byteDesc}} calls {{limitDecimalTo2}} which calls {{DecimalFormat.format}} with a pattern of {{#.##}}. This pattern does not include trailing zeroes, so the expected regex is incorrect in requiring a decimal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable it to return more information other than the INode array
Jing Zhao created HDFS-4124: --- Summary: Refactor INodeDirectory#getExistingPathINodes() to enable it to return more information other than the INode array Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4124: Attachment: HDFS-INodeDirecotry.trunk.001.patch Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-INodeDirecotry.trunk.001.patch Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4124: Attachment: (was: HDFS-INodeDirecotry.trunk.001.patch) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4124: Attachment: HDFS-INodeDirecotry.trunk.001.patch Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-INodeDirecotry.trunk.001.patch Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4124: Status: Patch Available (was: Open) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-INodeDirecotry.trunk.001.patch Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4124: Attachment: HDFS-INodeDirecotry.trunk.002.patch Thanks for the comments Suresh! So in the new patch I change the method signature to INodesInPath getExistingPathINodes(byte[][] components, int numOfINodes, boolean resolveLink), where the parameter numOfINodes indicates the number of INodes expected to return. Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-INodeDirecotry.trunk.001.patch, HDFS-INodeDirecotry.trunk.002.patch Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485837#comment-13485837 ] Jing Zhao commented on HDFS-4124: - We may want to include a number indicating the actual number of elements in INodesInPath. Currently in every place that calls getExistingPathINodes, the capacity of INodesInPath's INode array is always = the size of components, thus to implement INodesInPath without the number seems fine. (However, based on the logic in getExistingPathINodes, the capacity of INodesInPath is allowed to be larger than the size of components. Thus I guess we may add this number later or in the snapshot branch first.) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-INodeDirecotry.trunk.001.patch, HDFS-INodeDirecotry.trunk.002.patch Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485862#comment-13485862 ] Jing Zhao commented on HDFS-4124: - The test failure has been reported in HDFS-3267 and HDFS-3538. Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-INodeDirecotry.trunk.001.patch, HDFS-INodeDirecotry.trunk.002.patch Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
[ https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4124: Attachment: HDFS-INodeDirecotry.trunk.003.patch Update the javadoc for INodesInPath. Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array - Key: HDFS-4124 URL: https://issues.apache.org/jira/browse/HDFS-4124 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-INodeDirecotry.trunk.001.patch, HDFS-INodeDirecotry.trunk.002.patch, HDFS-INodeDirecotry.trunk.003.patch Currently INodeDirectory#getExistingPathINodes() uses an INode array to return the INodes resolved from the given path. For snapshot we need the function to be able to return more information when resolving a path for a snapshot file/dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4127) Log message is not correct in case of short of replica
[ https://issues.apache.org/jira/browse/HDFS-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486345#comment-13486345 ] Jing Zhao commented on HDFS-4127: - Hi Junping, so after you make data node 01 not qualified for choosing, I think you may also need to reset these two nodes back to healthy state at the end of the new testcase, otherwise the two unqualified nodes will affect the following testcases. Another thing is, when calculating the number of replicas still in need for the log output, do we also need to consider the original size of the results? For example, when calling the chooseTarget, if there are already 3 nodes chosen (i.e., we want to increase the number of replicas from 3 to N), after selecting another S nodes, totalReplicasExpected should be N-3, and the size of the results is 3+S, and we should expect (N-3)-(3+S)+3=N-3-S more nodes. Log message is not correct in case of short of replica -- Key: HDFS-4127 URL: https://issues.apache.org/jira/browse/HDFS-4127 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.4, 2.0.2-alpha Reporter: Junping Du Assignee: Junping Du Priority: Minor Attachments: HDFS-4127.patch For some reason that block cannot be placed with enough replica (like no enough available data nodes), it will throw a warning with wrong number of replica in short. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4127) Log message is not correct in case of short of replica
[ https://issues.apache.org/jira/browse/HDFS-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486393#comment-13486393 ] Jing Zhao commented on HDFS-4127: - Yeah I think that will be good. (And in that case the meaning of totalReplicasExpected will be the total number of replicas expected by the system, not the total number of extra replicas expected for the chooseTarget method. Log message is not correct in case of short of replica -- Key: HDFS-4127 URL: https://issues.apache.org/jira/browse/HDFS-4127 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.4, 2.0.2-alpha Reporter: Junping Du Assignee: Junping Du Priority: Minor Attachments: HDFS-4127.patch For some reason that block cannot be placed with enough replica (like no enough available data nodes), it will throw a warning with wrong number of replica in short. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4127) Log message is not correct in case of short of replica
[ https://issues.apache.org/jira/browse/HDFS-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486667#comment-13486667 ] Jing Zhao commented on HDFS-4127: - The new patch looks good. +1 for the patch. Log message is not correct in case of short of replica -- Key: HDFS-4127 URL: https://issues.apache.org/jira/browse/HDFS-4127 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.4, 2.0.2-alpha Reporter: Junping Du Assignee: Junping Du Priority: Minor Attachments: HDFS-4127.patch, HDFS-4127.patch For some reason that block cannot be placed with enough replica (like no enough available data nodes), it will throw a warning with wrong number of replica in short. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4118) Change INodeDirectory.getExistingPathINodes(..) to work with snapshots
[ https://issues.apache.org/jira/browse/HDFS-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4118: Attachment: HDFS-4118.001.patch Patch uploaded. The current patch also contains several testcases to test getExistingPathINodes under different scenarios. Change INodeDirectory.getExistingPathINodes(..) to work with snapshots -- Key: HDFS-4118 URL: https://issues.apache.org/jira/browse/HDFS-4118 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Jing Zhao Attachments: HDFS-4118.001.patch {code} int getExistingPathINodes(byte[][] components, INode[] existing, boolean resolveLink) {code} The INodeDirectory above retrieves existing INodes from the given path components. It needs to be updated in order to understand snapshot paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3923) libwebhdfs testing code cleanup
[ https://issues.apache.org/jira/browse/HDFS-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3923: Attachment: HDFS-3923.002.patch Updated the patch. libwebhdfs testing code cleanup --- Key: HDFS-3923 URL: https://issues.apache.org/jira/browse/HDFS-3923 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-3923.001.patch, HDFS-3923.002.patch 1. Testing code cleanup for libwebhdfs 1.1 Tests should generate a test-specific filename and should use TMPDIR appropriately. 2. Enabling automate testing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4132) when libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized memory
[ https://issues.apache.org/jira/browse/HDFS-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488387#comment-13488387 ] Jing Zhao commented on HDFS-4132: - That's bug brought by HDFS-3923. Thanks for the fix Colin! when libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized memory --- Key: HDFS-4132 URL: https://issues.apache.org/jira/browse/HDFS-4132 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4132.001.patch When libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized memory. Details: jconfStr is declared uninitialized... {code} struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) { struct NativeMiniDfsCluster* cl = NULL; jobject bld = NULL, bld2 = NULL, cobj = NULL; jvalue val; JNIEnv *env = getJNIEnv(); jthrowable jthr; jstring jconfStr; {code} and only initialized later if conf-webhdfsEnabled: {code} ... if (conf-webhdfsEnabled) { jthr = newJavaStr(env, DFS_WEBHDFS_ENABLED_KEY, jconfStr); if (jthr) { printExceptionAndFree(env, jthr, PRINT_EXC_ALL, ... {code} Then we try to free this uninitialized memory at the end, usually resulting in a crash. {code} (*env)-DeleteLocalRef(env, jconfStr); return cl; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4133) Add testcases for testing basic snapshot functionalities
Jing Zhao created HDFS-4133: --- Summary: Add testcases for testing basic snapshot functionalities Key: HDFS-4133 URL: https://issues.apache.org/jira/browse/HDFS-4133 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Add testcase for basic snapshot functionalities. In the test we keep creating snapshots, modifying original files, and check previous snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4133) Add testcases for testing basic snapshot functionalities
[ https://issues.apache.org/jira/browse/HDFS-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4133: Attachment: HDFS-4133.001.patch Initial patch. Also rename original TestSnapshot.java to TestSnapshotPathINodes.java Add testcases for testing basic snapshot functionalities Key: HDFS-4133 URL: https://issues.apache.org/jira/browse/HDFS-4133 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4133.001.patch Add testcase for basic snapshot functionalities. In the test we keep creating snapshots, modifying original files, and check previous snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4133) Add testcases for testing basic snapshot functionalities
[ https://issues.apache.org/jira/browse/HDFS-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4133: Attachment: HDFS-4133-test.002.patch Thanks for the cooments Suresh! New patch uploaded. Add testcases for testing basic snapshot functionalities Key: HDFS-4133 URL: https://issues.apache.org/jira/browse/HDFS-4133 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, name-node Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4133.001.patch, HDFS-4133-test.002.patch Add testcase for basic snapshot functionalities. In the test we keep creating snapshots, modifying original files, and check previous snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira