[jira] [Commented] (HDFS-3995) Use DFSTestUtil.createFile() for file creation and writing in test cases

2012-10-02 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468217#comment-13468217
 ] 

Jing Zhao commented on HDFS-3995:
-

The failed test cases seem unrelated: 
TestPersistBlocks#TestRestartDfsWithFlush -- HDFS-3811
TestNameNodeMetrics#testCorruptBlock -- HDFS-2434
TestBPOfferService#testBasicFunctionality -- HDFS-3930
TestHdfsNativeCodeLoader#testNativeCodeLoaded -- HDFS-3753



 Use DFSTestUtil.createFile() for file creation and writing in test cases
 

 Key: HDFS-3995
 URL: https://issues.apache.org/jira/browse/HDFS-3995
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-3995.trunk.001.patch


 Currently there are many tests that define and use their own methods to 
 create file and write some number of blocks in MiniDfsCluster. These methods 
 can be consolidated to DFSTestUtil.createFile().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-03 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-2434:


Attachment: HDFS-2434.001.patch

Based on Kihwal's analysis, can we solve the problem on the CorruptBlocks 
metric by disabling the heartbeats of datanodes before marking the block as 
corrupt?

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Uma Maheswara Rao G
 Attachments: HDFS-2434.001.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-03 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-2434:


Attachment: HDFS-2434.002.patch

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Uma Maheswara Rao G
 Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors

2012-10-04 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3920:


Attachment: HDFS-3920-005.patch

Updated the cleanup patch based on committed changes in HDFS-3916.

 libwebdhfs code cleanup: string processing and using strerror consistently to 
 handle all errors
 ---

 Key: HDFS-3920
 URL: https://issues.apache.org/jira/browse/HDFS-3920
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, 
 HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, 
 HDFS-3920-005.patch


 1. Clean up code for string processing;
 2. Using strerror consistently for error handling;
 3. Use sprintf to replace decToOctal
 4. other issues requiring fixing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors

2012-10-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469968#comment-13469968
 ] 

Jing Zhao commented on HDFS-3920:
-

Colin and Andy, thank you so much for the review and comments! I'm addressing 
the comments now and will post updated patch later.

 libwebdhfs code cleanup: string processing and using strerror consistently to 
 handle all errors
 ---

 Key: HDFS-3920
 URL: https://issues.apache.org/jira/browse/HDFS-3920
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, 
 HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, 
 HDFS-3920-005.patch


 1. Clean up code for string processing;
 2. Using strerror consistently for error handling;
 3. Use sprintf to replace decToOctal
 4. other issues requiring fixing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3912:


 Assignee: Jing Zhao  (was: nkeywal)
Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3912:


Attachment: HDFS-3912.006.patch

Upload the patch with minor updates.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3912:


Attachment: HDFS-3912.007.patch

The DataNode#heartbeatsDisabledForTests should be declared as volatile, and for 
new test cases in TestReplicaitonPolicy, instead of waiting, I explicitly call 
the heartbeatCheck() method.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3912:


Attachment: HDFS-3912.008.patch

Addressed Nicolas's comments. Now we check if the stale interval is positive 
instead of the original warning msg. 

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3912:


Attachment: HDFS-3912.009.patch

Updated based on Suresh's comments.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors

2012-10-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3920:


Attachment: HDFS-3920-006.patch

Uploaded the patch addressing Colin and Andy's comments.

 libwebdhfs code cleanup: string processing and using strerror consistently to 
 handle all errors
 ---

 Key: HDFS-3920
 URL: https://issues.apache.org/jira/browse/HDFS-3920
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, 
 HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, 
 HDFS-3920-005.patch, HDFS-3920-006.patch


 1. Clean up code for string processing;
 2. Using strerror consistently for error handling;
 3. Use sprintf to replace decToOctal
 4. other issues requiring fixing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473459#comment-13473459
 ] 

Jing Zhao commented on HDFS-3912:
-

Hi Nicolas, 
   I will work on the branch 1.1 patch. Hopefully I can upload the patch today 
or tomorrow.
Thanks,
-Jing

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-10 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3912:


Attachment: HDFS-3912-010.patch
HDFS-3912-branch-1.1-001.patch

Patch for branch 1.1. Also did some cleanup for the test code in the patch for 
trunk.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473771#comment-13473771
 ] 

Jing Zhao commented on HDFS-3912:
-

For the 1.1 patch, I've run local tests and all the testcases passed.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4036) FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException

2012-10-10 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4036:


Attachment: HDFS-4036-trunk.001.patch

Patch uploaded.

 FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException
 ---

 Key: HDFS-4036
 URL: https://issues.apache.org/jira/browse/HDFS-4036
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
 Attachments: HDFS-4036-trunk.001.patch


 The code in FSDirectory.unprotectedAddFile(..) does not throw 
 UnresolvedLinkException, we should remove throws UnresolvedLinkException 
 from the declaration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4036) FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException

2012-10-10 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4036:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException
 ---

 Key: HDFS-4036
 URL: https://issues.apache.org/jira/browse/HDFS-4036
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
 Attachments: HDFS-4036-trunk.001.patch


 The code in FSDirectory.unprotectedAddFile(..) does not throw 
 UnresolvedLinkException, we should remove throws UnresolvedLinkException 
 from the declaration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4052) FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug information outside of the namesystem lock

2012-10-15 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4052:
---

 Summary: FSNameSystem#invalidateWorkForOneNode and 
FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug 
information outside of the namesystem lock
 Key: HDFS-4052
 URL: https://issues.apache.org/jira/browse/HDFS-4052
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.2.0, 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


Currently in branch-1, both FSNameSystem#invalidateWorkForOneNode and 
FSNameSystem#computeReplicationWorkForBlock print debug information (which can 
be a long msg generated by traversing a list/array) without releasing the 
FSNameSystem lock. It would be better to move them outside of the namesystem 
lock.

This may also apply to FSNameSystem#invalidateWorkForOneNode in trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4052) FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug information outside of the namesystem lock

2012-10-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4052:


Attachment: HDFS-4052.trunk.001.patch
HDFS-4052.b1.001.patch

Patch uploaded for branch-1 and trunk.

 FSNameSystem#invalidateWorkForOneNode and 
 FSNameSystem#computeReplicationWorkForBlock in branch-1 should print debug 
 information outside of the namesystem lock
 ---

 Key: HDFS-4052
 URL: https://issues.apache.org/jira/browse/HDFS-4052
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.2.0, 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4052.b1.001.patch, HDFS-4052.trunk.001.patch


 Currently in branch-1, both FSNameSystem#invalidateWorkForOneNode and 
 FSNameSystem#computeReplicationWorkForBlock print debug information (which 
 can be a long msg generated by traversing a list/array) without releasing the 
 FSNameSystem lock. It would be better to move them outside of the namesystem 
 lock.
 This may also apply to FSNameSystem#invalidateWorkForOneNode in trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4059) Add number of stale DataNodes to metrics

2012-10-15 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4059:
---

 Summary: Add number of stale DataNodes to metrics
 Key: HDFS-4059
 URL: https://issues.apache.org/jira/browse/HDFS-4059
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


Add the number of stale DataNodes to metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4059) Add number of stale DataNodes to metrics

2012-10-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4059:


Attachment: HDFS-4059.trunk.001.patch

The patch includes a new testcase (testStaleNodes) for TestNameNodeMetrics. But 
on my local test TestNameNodeMetrics#testCorruptBlock may still fail. After 
applying the patch in HDFS-2434 my local test works fine.

 Add number of stale DataNodes to metrics
 

 Key: HDFS-4059
 URL: https://issues.apache.org/jira/browse/HDFS-4059
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 1.1.0, 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4059.trunk.001.patch


 Add the number of stale DataNodes to metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock

2012-10-15 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4062:
---

 Summary: In branch-1, FSNameSystem#invalidateWorkForOneNode and 
FSNameSystem#computeReplicationWorkForBlock should print logs outside of the 
namesystem lock
 Key: HDFS-4062
 URL: https://issues.apache.org/jira/browse/HDFS-4062
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode and 
FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long log 
info level information outside of the namesystem lock. We create this separate 
jira since the description and code is different for 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock

2012-10-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4062:


Attachment: HDFS-4062.b1.001.patch

Patch uploaded.

 In branch-1, FSNameSystem#invalidateWorkForOneNode and 
 FSNameSystem#computeReplicationWorkForBlock should print logs outside of the 
 namesystem lock
 ---

 Key: HDFS-4062
 URL: https://issues.apache.org/jira/browse/HDFS-4062
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4062.b1.001.patch


 Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode 
 and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long 
 log info level information outside of the namesystem lock. We create this 
 separate jira since the description and code is different for 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-2434:
---

Assignee: Jing Zhao

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4059) Add number of stale DataNodes to metrics

2012-10-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4059:


Attachment: HDFS-4059.trunk.002.patch

Thanks for the comments Suresh! The testStaleNodes testcase now explicitly 
calls the heartbeatCheck method through a new method in BlockManagerTestUtil, 
so that we can remove the Thread.sleep().

 Add number of stale DataNodes to metrics
 

 Key: HDFS-4059
 URL: https://issues.apache.org/jira/browse/HDFS-4059
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 1.1.0, 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4059.trunk.001.patch, HDFS-4059.trunk.002.patch


 Add the number of stale DataNodes to metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4059) Add number of stale DataNodes to metrics

2012-10-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4059:


Attachment: HDFS-4059.trunk.003.patch

Updated based on Suresh's comments.

 Add number of stale DataNodes to metrics
 

 Key: HDFS-4059
 URL: https://issues.apache.org/jira/browse/HDFS-4059
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 1.1.0, 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4059.trunk.001.patch, HDFS-4059.trunk.002.patch, 
 HDFS-4059.trunk.003.patch


 Add the number of stale DataNodes to metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-2434:


Attachment: HDFS-2434.trunk.003.patch

Made some further changes for the patch. In the testCorrupt testcase, because 
currently the delete operation will not remove the pending record in NN, it is 
possible that before the DN sends back a block has been received msg to NN, 
the block has been deleted due to the deletion request. In that case, it seems 
that the pending record cannot be removed until timeout. 

Thus the new patch first waits for the recovery to finish, and then do the 
deletion. 

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, 
 HDFS-2434.trunk.003.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3912:


Attachment: HDFS-3912-branch-1.patch

The patch for branch-1.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors

2012-10-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3920:


Attachment: HDFS-3920-007.patch

Colin, thanks for the comments! I've addressed most of your comments and will 
file another jira to fix the compile warnings (some of them are generated when 
compiling test code, which will be addressed in HDFS-3923).

 libwebdhfs code cleanup: string processing and using strerror consistently to 
 handle all errors
 ---

 Key: HDFS-3920
 URL: https://issues.apache.org/jira/browse/HDFS-3920
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, 
 HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, 
 HDFS-3920-005.patch, HDFS-3920-006.patch, HDFS-3920-007.patch


 1. Clean up code for string processing;
 2. Using strerror consistently for error handling;
 3. Use sprintf to replace decToOctal
 4. other issues requiring fixing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-2434:


Attachment: HDFS-2434.trunk.004.patch

The 003 patch could not apply to trunk after the changes in HDFS-4059. Modify 
the patch to be consistent.

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, 
 HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-16 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-2434:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Reopened)

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, 
 HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4059) Add number of stale DataNodes to metrics

2012-10-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478076#comment-13478076
 ] 

Jing Zhao commented on HDFS-4059:
-

I backport the patch to branch-1. Since the code is different, I will create 
another jira for that.

 Add number of stale DataNodes to metrics
 

 Key: HDFS-4059
 URL: https://issues.apache.org/jira/browse/HDFS-4059
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4059.trunk.001.patch, HDFS-4059.trunk.002.patch, 
 HDFS-4059.trunk.003.patch


 Add the number of stale DataNodes to metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4071) Add number of stale DataNodes to metrics for Branch-1

2012-10-17 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4071:
---

 Summary: Add number of stale DataNodes to metrics for Branch-1
 Key: HDFS-4071
 URL: https://issues.apache.org/jira/browse/HDFS-4071
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


Backport HDFS-4059 to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4071) Add number of stale DataNodes to metrics for Branch-1

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4071:


Attachment: HDFS-4059-backport-branch-1.001.patch

To avoid bringing extra complexity to TestNameNodeMetrics when changing the 
number of stale nodes in MiniDFSCluster test, I put the test in 
TestReplicationPolicy.

 Add number of stale DataNodes to metrics for Branch-1
 -

 Key: HDFS-4071
 URL: https://issues.apache.org/jira/browse/HDFS-4071
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Affects Versions: 1.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 1.1.0, 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4059-backport-branch-1.001.patch


 Backport HDFS-4059 to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3953) DFSOutputStream constructor does not use bufferSize parameter

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-3953.
-

Resolution: Duplicate

Duplicated with HDFS-4070

 DFSOutputStream constructor does not use bufferSize parameter
 -

 Key: HDFS-3953
 URL: https://issues.apache.org/jira/browse/HDFS-3953
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 DFSOutputStream constructor does not use bufferSize parameter. However, a 
 buffer size is always passed in many other methods defined in DFSClient, 
 DistributedFileSystem, and Hdfs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-17 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4072:
---

 Summary: When deleting a file, it would be better to also remove 
corresponding block records from BlockManager#pendingReplications
 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


Currently when deleting a file, blockManager does not remove records that are 
corresponding to the file's blocks from pendingRelications. These records can 
only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4072:


Attachment: TestPendingAndDelete.java

The attached test may generate the scenario where a pendingReplication record 
is left in BlockManager#pendingReplications until timeout.

 When deleting a file, it would be better to also remove corresponding block 
 records from BlockManager#pendingReplications
 -

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4072:


Attachment: HDFS-4072.trunk.001.patch

And a simple patch uploaded.

 When deleting a file, it would be better to also remove corresponding block 
 records from BlockManager#pendingReplications
 -

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4072.trunk.001.patch, TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4072:


Status: Patch Available  (was: Open)

 When deleting a file, it would be better to also remove corresponding block 
 records from BlockManager#pendingReplications
 -

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4072.trunk.001.patch, TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4072:


Attachment: HDFS-4072.trunk.002.patch

Thanks for the comments Suresh. The modified patch uploaded.

 When deleting a file, it would be better to also remove corresponding block 
 records from BlockManager#pendingReplications
 -

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, 
 TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4073) Two minor improvements to FSDirectory

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4073:


Attachment: HDFS-4073.trunk.001.patch

Patch uploaded.

 Two minor improvements to FSDirectory
 -

 Key: HDFS-4073
 URL: https://issues.apache.org/jira/browse/HDFS-4073
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4073.trunk.001.patch


 - Add a debug log message to FSDirectory.unprotectedAddFile(..) for the 
 caught IOException.
 - Remove throw UnresolvedLinkException from addToParent(..).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4073) Two minor improvements to FSDirectory

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4073:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 Two minor improvements to FSDirectory
 -

 Key: HDFS-4073
 URL: https://issues.apache.org/jira/browse/HDFS-4073
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4073.trunk.001.patch


 - Add a debug log message to FSDirectory.unprotectedAddFile(..) for the 
 caught IOException.
 - Remove throw UnresolvedLinkException from addToParent(..).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478640#comment-13478640
 ] 

Jing Zhao commented on HDFS-4072:
-

Thanks for the comment Eli! I think you're right: the 
PendingReplicationBlocks#remove only decrements the pending replication number 
by 1, it's not removing the whole record. So I guess we only need to remove the 
whole record about the block from PendingReplicationBlocks here, and we can 
still do this operation in BlockManager#removeBlock().

 When deleting a file, it would be better to also remove corresponding block 
 records from BlockManager#pendingReplications
 -

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, 
 TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-17 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4072:


Attachment: HDFS-4072.trunk.003.patch

Updated patch.

 When deleting a file, it would be better to also remove corresponding block 
 records from BlockManager#pendingReplications
 -

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, 
 HDFS-4072.trunk.003.patch, TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4072) When deleting a file, it would be better to also remove corresponding block records from BlockManager#pendingReplications

2012-10-18 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4072:


Attachment: HDFS-4072.trunk.004.patch

Eli, thanks for the advice. To address your comments, I made two replicas 
corrupt and checked if the pending replica size is 2 in the new testcase.

 When deleting a file, it would be better to also remove corresponding block 
 records from BlockManager#pendingReplications
 -

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, 
 HDFS-4072.trunk.003.patch, HDFS-4072.trunk.004.patch, 
 TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock

2012-10-18 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479455#comment-13479455
 ] 

Jing Zhao commented on HDFS-4062:
-

test-patch output:
-1 overall.  
+1 @author.  The patch does not contain any @author tags.
-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.
+1 javadoc.  The javadoc tool did not generate any warning messages.
+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.
-1 findbugs.  The patch appears to introduce 222 new Findbugs (version 
2.0.1) warnings.

 In branch-1, FSNameSystem#invalidateWorkForOneNode and 
 FSNameSystem#computeReplicationWorkForBlock should print logs outside of the 
 namesystem lock
 ---

 Key: HDFS-4062
 URL: https://issues.apache.org/jira/browse/HDFS-4062
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4062.b1.001.patch


 Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode 
 and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long 
 log info level information outside of the namesystem lock. We create this 
 separate jira since the description and code is different for 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4072) On file deletion remove corresponding blocks pending replication

2012-10-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4072:


Attachment: HDFS-4072.b1.001.patch

Branch-1 patch. Will run test-patch for it.

 On file deletion remove corresponding blocks pending replication
 

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4072.b1.001.patch, HDFS-4072.patch, 
 HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, 
 HDFS-4072.trunk.003.patch, HDFS-4072.trunk.004.patch, 
 TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-2434:


Attachment: HDFS-2434.trunk.005.patch

Update the patch based on the change in HDFS-4072.

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, 
 HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch, 
 HDFS-2434.trunk.005.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4072) On file deletion remove corresponding blocks pending replication

2012-10-19 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480427#comment-13480427
 ] 

Jing Zhao commented on HDFS-4072:
-

test-patch result for branch-1 patch:
-1 overall.  
+1 @author.  The patch does not contain any @author tags.
+1 tests included.  The patch appears to include 3 new or modified tests.
+1 javadoc.  The javadoc tool did not generate any warning messages.
+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.
-1 findbugs.  The patch appears to introduce 222 new Findbugs (version 
2.0.1) warnings.


 On file deletion remove corresponding blocks pending replication
 

 Key: HDFS-4072
 URL: https://issues.apache.org/jira/browse/HDFS-4072
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4072.b1.001.patch, HDFS-4072.patch, 
 HDFS-4072.trunk.001.patch, HDFS-4072.trunk.002.patch, 
 HDFS-4072.trunk.003.patch, HDFS-4072.trunk.004.patch, 
 TestPendingAndDelete.java


 Currently when deleting a file, blockManager does not remove records that are 
 corresponding to the file's blocks from pendingRelications. These records can 
 only be removed after timeout (5~10 min).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4093) In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3.

2012-10-19 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4093:
---

 Summary: In branch-1-win, AzureBlockPlacementPolicy#chooseTarget 
only returns one DN when replication factor is greater than 3. 
 Key: HDFS-4093
 URL: https://issues.apache.org/jira/browse/HDFS-4093
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao


In branch-1-win, when AzureBlockPlacementPolicy (which extends the 
BlockPlacementPolicyDefault) is used, if the client increases the number of 
replicas (e.g., from 3 to 10), AzureBlockPlacementPolicy#chooseTarget will 
return only 1 Datanode each time. Thus in 
FSNameSystem#computeReplicationWorkForBlock, it is possible that the 
replication monitor may choose a datanode that has been chosen as target but 
still in the pendingReplications (because computeReplicationWorkForBlock does 
not check the pending replication before doing the chooseTarget). 

To avoid this hit-the-same-datanode scenario, we modify the 
AzureBlockPlacementPolicy#chooseTarget to make it return multiple DN. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4093) In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3.

2012-10-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4093:


Attachment: HDFS-b1-win-4093.001.patch

Patch uploaded.

 In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN 
 when replication factor is greater than 3. 
 ---

 Key: HDFS-4093
 URL: https://issues.apache.org/jira/browse/HDFS-4093
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-b1-win-4093.001.patch


 In branch-1-win, when AzureBlockPlacementPolicy (which extends the 
 BlockPlacementPolicyDefault) is used, if the client increases the number of 
 replicas (e.g., from 3 to 10), AzureBlockPlacementPolicy#chooseTarget will 
 return only 1 Datanode each time. Thus in 
 FSNameSystem#computeReplicationWorkForBlock, it is possible that the 
 replication monitor may choose a datanode that has been chosen as target but 
 still in the pendingReplications (because computeReplicationWorkForBlock does 
 not check the pending replication before doing the chooseTarget). 
 To avoid this hit-the-same-datanode scenario, we modify the 
 AzureBlockPlacementPolicy#chooseTarget to make it return multiple DN. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4061) TestBalancer and TestUnderReplicatedBlocks need timeouts

2012-10-19 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480523#comment-13480523
 ] 

Jing Zhao commented on HDFS-4061:
-

Nicholas, I checked the test output and guess maybe the test failure is caused 
by this:

When the NameNode invalides a block for a datanode D1 and remove the 
datanode-block pair from the blockMap, and before the invalidation request is 
sent to the datanode D1, the BlockManager#computeDataNodeWork also starts to 
work and schedule the replication to D1. So the invalidation and replication 
request will be sent to D1 at the same time. D1 will then ignore the 
replication request (also throws a ReplicaAlreadyExistsException), and delete 
the replica. Thus NN cannot receive the blockreceived msg from D1. And the 
testcast will timeout in 5min which is smaller than the timeout of 
PendingReplication request (usually 5~10 min).

I can file another jira to fix the testcase if you think it is correct.

 TestBalancer and TestUnderReplicatedBlocks need timeouts
 

 Key: HDFS-4061
 URL: https://issues.apache.org/jira/browse/HDFS-4061
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.0.3-alpha

 Attachments: hdfs-4061.txt


 Saw TestBalancer and TestUnderReplicatedBlocks timeout hard on a jenkins job 
 recently, let's annotate the relevant tests with timeouts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-4067:
---

Assignee: Jing Zhao

 TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
 ---

 Key: HDFS-4067
 URL: https://issues.apache.org/jira/browse/HDFS-4067
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
  Labels: test-fail

 After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
 the root cause of the failure is ReplicaAlreadyExistsException:
 {noformat}
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
 already exists in state FINALIZED and thus cannot be created.
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-19 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480529#comment-13480529
 ] 

Jing Zhao commented on HDFS-4067:
-

Move the discussion from HDFS-4061 here:

When the NameNode invalides a block for a datanode D1 and remove the 
datanode-block pair from the blockMap, and before the invalidation request is 
sent to the datanode D1, the BlockManager#computeDataNodeWork also starts to 
work and schedule the replication to D1. So the invalidation and replication 
request will be sent to D1 at the same time. D1 will then ignore the 
replication request (also throws a ReplicaAlreadyExistsException), and delete 
the replica. Thus NN cannot receive the blockreceived msg from D1. And the 
testcast will timeout in 5min which is smaller than the timeout of 
PendingReplication request (usually 5~10 min).

 TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
 ---

 Key: HDFS-4067
 URL: https://issues.apache.org/jira/browse/HDFS-4067
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
  Labels: test-fail

 After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
 the root cause of the failure is ReplicaAlreadyExistsException:
 {noformat}
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
 already exists in state FINALIZED and thus cannot be created.
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-19 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480535#comment-13480535
 ] 

Jing Zhao commented on HDFS-4067:
-

And I guess that's also the reason for HDFS-342? Since the initial replication 
request is ignored, the replication on D1 can only be done after the pending 
replication timeout.

 TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
 ---

 Key: HDFS-4067
 URL: https://issues.apache.org/jira/browse/HDFS-4067
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
  Labels: test-fail

 After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
 the root cause of the failure is ReplicaAlreadyExistsException:
 {noformat}
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
 already exists in state FINALIZED and thus cannot be created.
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-19 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4067:


Attachment: HDFS-4067.trunk.001.patch

Initial patch to fix.

 TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
 ---

 Key: HDFS-4067
 URL: https://issues.apache.org/jira/browse/HDFS-4067
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-4067.trunk.001.patch


 After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
 the root cause of the failure is ReplicaAlreadyExistsException:
 {noformat}
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
 already exists in state FINALIZED and thus cannot be created.
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4095) Add snapshot related metrics

2012-10-20 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4095:
---

 Summary: Add snapshot related metrics
 Key: HDFS-4095
 URL: https://issues.apache.org/jira/browse/HDFS-4095
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


Add metrics for number of snapshots in the system, including 1) number of 
snapshot files, and 2) number of snapshot only files (snapshot file that are 
not deleted but the original file is already deleted).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4096) Add snapshot information to namenode WebUI

2012-10-20 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4096:
---

 Summary: Add snapshot information to namenode WebUI
 Key: HDFS-4096
 URL: https://issues.apache.org/jira/browse/HDFS-4096
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


Add snapshot information to namenode WebUI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4095) Add snapshot related metrics

2012-10-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4095:


Attachment: HDFS-4095.001.patch

Initial patch defining a group of snapshot-related metrics. 

 Add snapshot related metrics
 

 Key: HDFS-4095
 URL: https://issues.apache.org/jira/browse/HDFS-4095
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4095.001.patch


 Add metrics for number of snapshots in the system, including 1) number of 
 snapshot files, and 2) number of snapshot only files (snapshot file that are 
 not deleted but the original file is already deleted).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4096) Add snapshot information to namenode WebUI

2012-10-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4096:


Attachment: HDFS-4096.relative.001.patch

Initial patch that only adds snapshot-related stats summary to NN WebUI.

 Add snapshot information to namenode WebUI
 --

 Key: HDFS-4096
 URL: https://issues.apache.org/jira/browse/HDFS-4096
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4096.relative.001.patch


 Add snapshot information to namenode WebUI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4106:


Attachment: HDFS-4106-trunk.001.patch

 BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
 declared as volatile
 --

 Key: HDFS-4106
 URL: https://issues.apache.org/jira/browse/HDFS-4106
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4106-trunk.001.patch


 All these variables may be assigned/read by a testing thread (through 
 BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
 they should be declared as volatile to make sure the happens-before 
 consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4106:
---

 Summary: BPServiceActor#lastHeartbeat, lastBlockReport and 
lastDeletedReport should be declared as volatile
 Key: HDFS-4106
 URL: https://issues.apache.org/jira/browse/HDFS-4106
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4106-trunk.001.patch

All these variables may be assigned/read by a testing thread (through 
BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
they should be declared as volatile to make sure the happens-before 
consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4106:


Status: Patch Available  (was: Open)

 BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
 declared as volatile
 --

 Key: HDFS-4106
 URL: https://issues.apache.org/jira/browse/HDFS-4106
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4106-trunk.001.patch


 All these variables may be assigned/read by a testing thread (through 
 BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
 they should be declared as volatile to make sure the happens-before 
 consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4067:


Status: Patch Available  (was: Open)

 TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
 ---

 Key: HDFS-4067
 URL: https://issues.apache.org/jira/browse/HDFS-4067
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-4067.trunk.001.patch


 After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
 the root cause of the failure is ReplicaAlreadyExistsException:
 {noformat}
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
 already exists in state FINALIZED and thus cannot be created.
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482105#comment-13482105
 ] 

Jing Zhao commented on HDFS-3616:
-

Also got this exception in HDFS-4106. Seems like the exception happens because 
a thread is iterating the hashmap bpSlices (FsVolumeImpl#shutdown) while 
another thread is remove entries from the same hashMap 
(FsVolumeImpl#shutdownBlockPool). A quick fix can be changing bpSlices from a 
HashMap to a ConcurrentHashMap.

 TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
 in DN shutdown
 --

 Key: HDFS-3616
 URL: https://issues.apache.org/jira/browse/HDFS-3616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Uma Maheswara Rao G

 I have seen this in precommit build #2743
 {noformat}
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
   at 
 org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-3616:
---

Assignee: Jing Zhao

 TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
 in DN shutdown
 --

 Key: HDFS-3616
 URL: https://issues.apache.org/jira/browse/HDFS-3616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao

 I have seen this in precommit build #2743
 {noformat}
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
   at 
 org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482108#comment-13482108
 ] 

Jing Zhao commented on HDFS-4106:
-

Failing testcases are related to HDFS-3616 (TestWebHdfsWithMultipleNameNodes) 
and HDFS-4067 (TestUnderReplicatedBlocks).

 BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
 declared as volatile
 --

 Key: HDFS-4106
 URL: https://issues.apache.org/jira/browse/HDFS-4106
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4106-trunk.001.patch


 All these variables may be assigned/read by a testing thread (through 
 BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
 they should be declared as volatile to make sure the happens-before 
 consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482131#comment-13482131
 ] 

Jing Zhao commented on HDFS-4067:
-

testcase failure reported in HDFS-3948 before. Will run 
TestUnderReplicatedBlocks in loop later.  

 TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
 ---

 Key: HDFS-4067
 URL: https://issues.apache.org/jira/browse/HDFS-4067
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-4067.trunk.001.patch


 After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
 the root cause of the failure is ReplicaAlreadyExistsException:
 {noformat}
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
 already exists in state FINALIZED and thus cannot be created.
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently

2012-10-23 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482461#comment-13482461
 ] 

Jing Zhao commented on HDFS-2434:
-

Have run the testcase 551 times locally and all of them passed.

 TestNameNodeMetrics.testCorruptBlock fails intermittently
 -

 Key: HDFS-2434
 URL: https://issues.apache.org/jira/browse/HDFS-2434
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, 
 HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch, 
 HDFS-2434.trunk.005.patch


 java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
 was:0
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
   at 
 org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3616:


Attachment: HDFS-3616.trunk.001.patch

After checking the code, I guess the exception is caused by this process:

1. In DataNode#shutdown(), DataNode#shouldRun is set to false.

2. BPServiceActor#run() stops running, and runs BPServiceActor#cleanUp().

3. While executing BPServiceActor#cleanUp(), DataNode#shutdownBlockPool() is 
called, where blockPoolManager.remove(bpos) is executed before 
this.blockPoolManager.shutDownAll(); is called in DataNode#shutdown(). Thus 
the corresponding BPOfferService cannot be seen and shutdown by 
blockPoolManager#shutDownAll() since it has been removed from 
BlockPoolManager#offerServices.

4. The actor thread continues running DataNode#shutdownBlockPool() which will 
finally tries to remove record from FsVolumeImpl#bpSlices, while the DataNode 
shutdown thread runs into FsVolumeImpl#shutdown() which iterates the bpSlices. 
Thus the ConcurrentModificationException may be thrown.

So to avoid changing other code, maybe we can simply change bpSlices from 
HashMap to ConcurrentHashMap? A simple patch based on this is attached.

 TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
 in DN shutdown
 --

 Key: HDFS-3616
 URL: https://issues.apache.org/jira/browse/HDFS-3616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Attachments: HDFS-3616.trunk.001.patch


 I have seen this in precommit build #2743
 {noformat}
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
   at 
 org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-23 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482772#comment-13482772
 ] 

Jing Zhao commented on HDFS-4106:
-

Thanks for the comments Brandon! So the cost of a volatile read/write may be an 
extra memory access. For a BPServiceActor thread which communicate with NN 
periodically, I think this may not cause a performance problem (also 
considering variables like lastHeartbeat are not accessed a lot). While without 
the volatile keyword it is possible that the triggerHeartbeatForTests cannot 
trigger the heartbeat as it intends to, since the change of lastheartbeat may 
not be seen by the actor thread. Also the testing thread may be waiting for an 
unknown period of time because the change of lastheartbeat by the actor thread 
may not be seen by the testing thread.

 BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
 declared as volatile
 --

 Key: HDFS-4106
 URL: https://issues.apache.org/jira/browse/HDFS-4106
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4106-trunk.001.patch


 All these variables may be assigned/read by a testing thread (through 
 BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
 they should be declared as volatile to make sure the happens-before 
 consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4093) In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN when replication factor is greater than 3.

2012-10-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4093:


Attachment: HDFS-b1-win-4093.002.patch

Updated the patch. Have passed local testcases.

 In branch-1-win, AzureBlockPlacementPolicy#chooseTarget only returns one DN 
 when replication factor is greater than 3. 
 ---

 Key: HDFS-4093
 URL: https://issues.apache.org/jira/browse/HDFS-4093
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-b1-win-4093.001.patch, HDFS-b1-win-4093.002.patch


 In branch-1-win, when AzureBlockPlacementPolicy (which extends the 
 BlockPlacementPolicyDefault) is used, if the client increases the number of 
 replicas (e.g., from 3 to 10), AzureBlockPlacementPolicy#chooseTarget will 
 return only 1 Datanode each time. Thus in 
 FSNameSystem#computeReplicationWorkForBlock, it is possible that the 
 replication monitor may choose a datanode that has been chosen as target but 
 still in the pendingReplications (because computeReplicationWorkForBlock does 
 not check the pending replication before doing the chooseTarget). 
 To avoid this hit-the-same-datanode scenario, we modify the 
 AzureBlockPlacementPolicy#chooseTarget to make it return multiple DN. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile

2012-10-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4106:


Attachment: HDFS-4106-trunk.002.patch

Updated based on Brandon's comments.

 BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be 
 declared as volatile
 --

 Key: HDFS-4106
 URL: https://issues.apache.org/jira/browse/HDFS-4106
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4106-trunk.001.patch, HDFS-4106-trunk.002.patch


 All these variables may be assigned/read by a testing thread (through 
 BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus 
 they should be declared as volatile to make sure the happens-before 
 consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3616:


Attachment: HDFS-3616.trunk.002.patch

After discussing with Nicholas, we think to avoid the 
concurrentModificationException, we only need to keep a copy of 
BlockPoolManager#offerServices before we set DataNode#shouldRun to false. In 
that case, blockPoolManager#shutDownAll() can access and shutdown all the actor 
threads thus no concurrent access of the bpSlices will happen anymore.

Uploaded a patch based on this.

 TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
 in DN shutdown
 --

 Key: HDFS-3616
 URL: https://issues.apache.org/jira/browse/HDFS-3616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Attachments: HDFS-3616.trunk.001.patch, HDFS-3616.trunk.002.patch


 I have seen this in precommit build #2743
 {noformat}
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
   at 
 org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3616:


Status: Patch Available  (was: Open)

 TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
 in DN shutdown
 --

 Key: HDFS-3616
 URL: https://issues.apache.org/jira/browse/HDFS-3616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Attachments: HDFS-3616.trunk.001.patch, HDFS-3616.trunk.002.patch


 I have seen this in precommit build #2743
 {noformat}
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
   at 
 org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown

2012-10-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3616:


Attachment: HDFS-3616.trunk.003.patch

Need to check if blockPoolManager is null before call getAllNamenodeThreads().

 TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
 in DN shutdown
 --

 Key: HDFS-3616
 URL: https://issues.apache.org/jira/browse/HDFS-3616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Attachments: HDFS-3616.trunk.001.patch, HDFS-3616.trunk.002.patch, 
 HDFS-3616.trunk.003.patch


 I have seen this in precommit build #2743
 {noformat}
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$EntryIterator.next(HashMap.java:834)
   at java.util.HashMap$EntryIterator.next(HashMap.java:832)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
   at 
 org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

2012-10-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483395#comment-13483395
 ] 

Jing Zhao commented on HDFS-4067:
-

Run the testcase ~800 times and all of them passed.

 TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
 ---

 Key: HDFS-4067
 URL: https://issues.apache.org/jira/browse/HDFS-4067
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
  Labels: test-fail
 Attachments: HDFS-4067.trunk.001.patch


 After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see 
 the root cause of the failure is ReplicaAlreadyExistsException:
 {noformat}
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 
 already exists in state FINALIZED and thus cannot be created.
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
   at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:155)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy

2012-10-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3948:


Attachment: HDFS-3948-regenerate-exception.patch

Also got this exception in HDFS-3616 and HDFS-4067. After checking the code, I 
guess this exception may be caused because of this process:

1. A FSDataOutputStream instance (out4) is created through 
WebHdfsFileSystem#create, in order to create and write a new file.

2. The request is redirected to a DN, where DFSClient#create is called to 
create the file in NN through RPC.

3. At this time, the test has called MiniDfsCluster#shutdownNameNode, and in 
NameNode#stop(), the FSNamesystem has been shutdown (where the FSEditLog will 
be close) but the RPCServer has not been closed yet.

4. The RPC request from DN is sent to NN and FSEditLog#logEdit is called for 
the creation. But at this time the FSEditLog has already been closed and 
FSEditLog#editLogStream has been set to null.

Therefore, if the assertion is enabled, a bad state: CLOSED will be returned 
to client finally (the case in HDFS-3948); if the assertion is not enabled, 
because FSEditLog#editLogStream has been set to null, a NPE will be returned as 
reported in HDFS-3822.

The attached patch can regenerate the exception.

 TestWebHDFS#testNamenodeRestart is racy 
 

 Key: HDFS-3948
 URL: https://issues.apache.org/jira/browse/HDFS-3948
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
 Attachments: HDFS-3948-regenerate-exception.patch


 After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails 
 when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close 
 is racing with restart and resulting in an add block after the edit log is 
 closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy

2012-10-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-3948:
---

Assignee: Jing Zhao

 TestWebHDFS#testNamenodeRestart is racy 
 

 Key: HDFS-3948
 URL: https://issues.apache.org/jira/browse/HDFS-3948
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
 Attachments: HDFS-3948-regenerate-exception.patch


 After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails 
 when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close 
 is racing with restart and resulting in an add block after the edit log is 
 closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy

2012-10-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483646#comment-13483646
 ] 

Jing Zhao commented on HDFS-3948:
-

Correction: HDFS-3822's NPE should be caused by BlockManager race, as Eli 
commented.

 TestWebHDFS#testNamenodeRestart is racy 
 

 Key: HDFS-3948
 URL: https://issues.apache.org/jira/browse/HDFS-3948
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
 Attachments: HDFS-3948-regenerate-exception.patch


 After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails 
 when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close 
 is racing with restart and resulting in an add block after the edit log is 
 closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy

2012-10-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3948:


Attachment: HDFS-3948-regenerate-exception.002.patch

The prior patch actually generates the exception while NN executing 
FSNameSystem#startFile. The new patch can generate the same exception while NN 
executing FSNamesystem#allocateBlock (the same with the one reported in 
HDFS-3948).

 TestWebHDFS#testNamenodeRestart is racy 
 

 Key: HDFS-3948
 URL: https://issues.apache.org/jira/browse/HDFS-3948
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
 Attachments: HDFS-3948-regenerate-exception.002.patch, 
 HDFS-3948-regenerate-exception.patch


 After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails 
 when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close 
 is racing with restart and resulting in an add block after the edit log is 
 closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3948) TestWebHDFS#testNamenodeRestart is racy

2012-10-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3948:


Attachment: HDFS-3948.001.patch

Initial patch to fix the testcase. Because webhdfs does not support hflush, it 
is difficult to avoid the race between webhdfs@DN's writing and NN's shutdown. 
Thus in this patch, I close the corresponding FSDataOutputStream (instead of 
calling its hflush) when the testcase is run for webhdfs.

 TestWebHDFS#testNamenodeRestart is racy 
 

 Key: HDFS-3948
 URL: https://issues.apache.org/jira/browse/HDFS-3948
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jing Zhao
 Attachments: HDFS-3948.001.patch, 
 HDFS-3948-regenerate-exception.002.patch, HDFS-3948-regenerate-exception.patch


 After fixing HDFS-3936 I noticed that TestWebHDFS#testNamenodeRestart fails 
 when looping it, on my system it takes about 40 runs. WebHdfsFileSystem#close 
 is racing with restart and resulting in an add block after the edit log is 
 closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4115) TestHDFSCLI.testAll fails one test due to number format

2012-10-26 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485281#comment-13485281
 ] 

Jing Zhao commented on HDFS-4115:
-

The patch looks good. +1 for the patch.

 TestHDFSCLI.testAll fails one test due to number format
 ---

 Key: HDFS-4115
 URL: https://issues.apache.org/jira/browse/HDFS-4115
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
 Environment: Apache Maven 3.0.4
 Maven home: /usr/share/maven
 Java version: 1.6.0_35, vendor: Sun Microsystems Inc.
 Java home: /usr/lib/jvm/j2sdk1.6-oracle/jre
 Default locale: en_US, platform encoding: ISO-8859-1
 OS name: linux, version: 3.2.0-32-generic, arch: amd64, family: unix
Reporter: Trevor Robinson
Assignee: Trevor Robinson
 Attachments: HDFS-4115.patch


 This test fails repeatedly on only one of my machines:
 {noformat}
 Failed tests:   testAll(org.apache.hadoop.cli.TestHDFSCLI): One of the tests 
 failed. See the Detailed results to identify the command that failed
Test ID: [587]
   Test Description: [report: Displays the report about the Datanodes]
  Test Commands: [-fs hdfs://localhost:35254 -report]
 Comparator: [RegexpComparator]
 Comparision result:   [fail]
Expected output:   [Configured Capacity: [0-9]+ \([0-9]+\.[0-9]+ 
 [BKMGT]+\)]
  Actual output:   [Configured Capacity: 472446337024 (440 GB)
 {noformat}
 The problem appears to be that {{StringUtils.byteDesc}} calls 
 {{limitDecimalTo2}} which calls {{DecimalFormat.format}} with a pattern of 
 {{#.##}}. This pattern does not include trailing zeroes, so the expected 
 regex is incorrect in requiring a decimal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable it to return more information other than the INode array

2012-10-28 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4124:
---

 Summary: Refactor INodeDirectory#getExistingPathINodes() to enable 
it to return more information other than the INode array
 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


Currently INodeDirectory#getExistingPathINodes() uses an INode array to return 
the INodes resolved from the given path. For snapshot we need the function to 
be able to return more information when resolving a path for a snapshot 
file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-28 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4124:


Attachment: HDFS-INodeDirecotry.trunk.001.patch

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-INodeDirecotry.trunk.001.patch


 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-28 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4124:


Attachment: (was: HDFS-INodeDirecotry.trunk.001.patch)

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor

 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-28 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4124:


Attachment: HDFS-INodeDirecotry.trunk.001.patch

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-INodeDirecotry.trunk.001.patch


 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-28 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4124:


Status: Patch Available  (was: Open)

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-INodeDirecotry.trunk.001.patch


 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-28 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4124:


Attachment: HDFS-INodeDirecotry.trunk.002.patch

Thanks for the comments Suresh! So in the new patch I change the method 
signature to INodesInPath getExistingPathINodes(byte[][] components, int 
numOfINodes, boolean resolveLink), where the parameter numOfINodes indicates 
the number of INodes expected to return.

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-INodeDirecotry.trunk.001.patch, 
 HDFS-INodeDirecotry.trunk.002.patch


 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-28 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485837#comment-13485837
 ] 

Jing Zhao commented on HDFS-4124:
-

We may want to include a number indicating the actual number of elements in 
INodesInPath. Currently in every place that calls getExistingPathINodes, the 
capacity of INodesInPath's INode array is always = the size of components, 
thus to implement INodesInPath without the number seems fine. (However, based 
on the logic in getExistingPathINodes, the capacity of INodesInPath is allowed 
to be larger than the size of components. Thus I guess we may add this number 
later or in the snapshot branch first.)

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-INodeDirecotry.trunk.001.patch, 
 HDFS-INodeDirecotry.trunk.002.patch


 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485862#comment-13485862
 ] 

Jing Zhao commented on HDFS-4124:
-

The test failure has been reported in HDFS-3267 and HDFS-3538.

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-INodeDirecotry.trunk.001.patch, 
 HDFS-INodeDirecotry.trunk.002.patch


 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4124) Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array

2012-10-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4124:


Attachment: HDFS-INodeDirecotry.trunk.003.patch

Update the javadoc for INodesInPath.

 Refactor INodeDirectory#getExistingPathINodes() to enable returning more than 
 INode array
 -

 Key: HDFS-4124
 URL: https://issues.apache.org/jira/browse/HDFS-4124
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-INodeDirecotry.trunk.001.patch, 
 HDFS-INodeDirecotry.trunk.002.patch, HDFS-INodeDirecotry.trunk.003.patch


 Currently INodeDirectory#getExistingPathINodes() uses an INode array to 
 return the INodes resolved from the given path. For snapshot we need the 
 function to be able to return more information when resolving a path for a 
 snapshot file/dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4127) Log message is not correct in case of short of replica

2012-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486345#comment-13486345
 ] 

Jing Zhao commented on HDFS-4127:
-

Hi Junping, so after you make data node 01 not qualified for choosing, I think 
you may also need to reset these two nodes back to healthy state at the end of 
the new testcase, otherwise the two unqualified nodes will affect the following 
testcases. 

Another thing is, when calculating the number of replicas still in need for the 
log output, do we also need to consider the original size of the results? For 
example, when calling the chooseTarget, if there are already 3 nodes chosen 
(i.e., we want to increase the number of replicas from 3 to N), after selecting 
another S nodes, totalReplicasExpected should be N-3, and the size of the 
results is 3+S, and we should expect (N-3)-(3+S)+3=N-3-S more nodes.

 Log message is not correct in case of short of replica
 --

 Key: HDFS-4127
 URL: https://issues.apache.org/jira/browse/HDFS-4127
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.4, 2.0.2-alpha
Reporter: Junping Du
Assignee: Junping Du
Priority: Minor
 Attachments: HDFS-4127.patch


 For some reason that block cannot be placed with enough replica (like no 
 enough available data nodes), it will throw a warning with wrong number of 
 replica in short.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4127) Log message is not correct in case of short of replica

2012-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486393#comment-13486393
 ] 

Jing Zhao commented on HDFS-4127:
-

Yeah I think that will be good. (And in that case the meaning of 
totalReplicasExpected will be the total number of replicas expected by the 
system, not the total number of extra replicas expected for the chooseTarget 
method.

 Log message is not correct in case of short of replica
 --

 Key: HDFS-4127
 URL: https://issues.apache.org/jira/browse/HDFS-4127
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.4, 2.0.2-alpha
Reporter: Junping Du
Assignee: Junping Du
Priority: Minor
 Attachments: HDFS-4127.patch


 For some reason that block cannot be placed with enough replica (like no 
 enough available data nodes), it will throw a warning with wrong number of 
 replica in short.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4127) Log message is not correct in case of short of replica

2012-10-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486667#comment-13486667
 ] 

Jing Zhao commented on HDFS-4127:
-

The new patch looks good. +1 for the patch.

 Log message is not correct in case of short of replica
 --

 Key: HDFS-4127
 URL: https://issues.apache.org/jira/browse/HDFS-4127
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.4, 2.0.2-alpha
Reporter: Junping Du
Assignee: Junping Du
Priority: Minor
 Attachments: HDFS-4127.patch, HDFS-4127.patch


 For some reason that block cannot be placed with enough replica (like no 
 enough available data nodes), it will throw a warning with wrong number of 
 replica in short.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4118) Change INodeDirectory.getExistingPathINodes(..) to work with snapshots

2012-10-30 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4118:


Attachment: HDFS-4118.001.patch

Patch uploaded. The current patch also contains several testcases to test 
getExistingPathINodes under different scenarios.

 Change INodeDirectory.getExistingPathINodes(..) to work with snapshots
 --

 Key: HDFS-4118
 URL: https://issues.apache.org/jira/browse/HDFS-4118
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
 Attachments: HDFS-4118.001.patch


 {code}
 int getExistingPathINodes(byte[][] components, INode[] existing, boolean 
 resolveLink)
 {code}
 The INodeDirectory above retrieves existing INodes from the given path 
 components.  It needs to be updated in order to understand snapshot paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3923) libwebhdfs testing code cleanup

2012-10-30 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3923:


Attachment: HDFS-3923.002.patch

Updated the patch.

 libwebhdfs testing code cleanup
 ---

 Key: HDFS-3923
 URL: https://issues.apache.org/jira/browse/HDFS-3923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3923.001.patch, HDFS-3923.002.patch


 1. Testing code cleanup for libwebhdfs
 1.1 Tests should generate a test-specific filename and should use TMPDIR 
 appropriately.
 2. Enabling automate testing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4132) when libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized memory

2012-10-31 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488387#comment-13488387
 ] 

Jing Zhao commented on HDFS-4132:
-

That's bug brought by HDFS-3923. Thanks for the fix Colin!

 when libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized 
 memory 
 ---

 Key: HDFS-4132
 URL: https://issues.apache.org/jira/browse/HDFS-4132
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.3-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4132.001.patch


 When libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized 
 memory.
 Details: jconfStr is declared uninitialized...
 {code}
 struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
 {
 struct NativeMiniDfsCluster* cl = NULL;
 jobject bld = NULL, bld2 = NULL, cobj = NULL;
 jvalue  val;
 JNIEnv *env = getJNIEnv();
 jthrowable jthr;
 jstring jconfStr;
 {code}
 and only initialized later if conf-webhdfsEnabled:
 {code}
 ...
 if (conf-webhdfsEnabled) {
 jthr = newJavaStr(env, DFS_WEBHDFS_ENABLED_KEY, jconfStr);
 if (jthr) {
 printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
 ...
 {code}
 Then we try to free this uninitialized memory at the end, usually resulting 
 in a crash.
 {code}
 (*env)-DeleteLocalRef(env, jconfStr);
 return cl;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4133) Add testcases for testing basic snapshot functionalities

2012-10-31 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4133:
---

 Summary: Add testcases for testing basic snapshot functionalities
 Key: HDFS-4133
 URL: https://issues.apache.org/jira/browse/HDFS-4133
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


Add testcase for basic snapshot functionalities. In the test we keep creating 
snapshots, modifying original files, and check previous snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4133) Add testcases for testing basic snapshot functionalities

2012-10-31 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4133:


Attachment: HDFS-4133.001.patch

Initial patch. Also rename original TestSnapshot.java to 
TestSnapshotPathINodes.java

 Add testcases for testing basic snapshot functionalities
 

 Key: HDFS-4133
 URL: https://issues.apache.org/jira/browse/HDFS-4133
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4133.001.patch


 Add testcase for basic snapshot functionalities. In the test we keep creating 
 snapshots, modifying original files, and check previous snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4133) Add testcases for testing basic snapshot functionalities

2012-11-01 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4133:


Attachment: HDFS-4133-test.002.patch

Thanks for the cooments Suresh! New patch uploaded.

 Add testcases for testing basic snapshot functionalities
 

 Key: HDFS-4133
 URL: https://issues.apache.org/jira/browse/HDFS-4133
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4133.001.patch, HDFS-4133-test.002.patch


 Add testcase for basic snapshot functionalities. In the test we keep creating 
 snapshots, modifying original files, and check previous snapshots.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


<    3   4   5   6   7   8   9   10   11   12   >