[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250006#comment-14250006
 ] 

Hudson commented on HDFS-6425:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249951#comment-14249951
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249908#comment-14249908
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249898#comment-14249898
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249743#comment-14249743
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #778 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/778/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249737#comment-14249737
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #44 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/44/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248465#comment-14248465
 ] 

Hudson commented on HDFS-6425:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6729 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6729/])
HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport 
latency. Contributed by Ming Ma. (kihwal: rev 
b7923a356e9f111619375b94d12749d634069347)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-16 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248463#comment-14248463
 ] 

Ming Ma commented on HDFS-6425:
---

Thanks [~kihwal] and [~arpitagarwal].

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.7.0
>
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-16 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248448#comment-14248448
 ] 

Kihwal Lee commented on HDFS-6425:
--

+1 I've written a similar patch for the same issue. The patch looks good.  
Thanks for posting the updated patch.

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247695#comment-14247695
 ] 

Hadoop QA commented on HDFS-6425:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687355/HDFS-6425-3.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9044//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9044//console

This message is automatically generated.

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246808#comment-14246808
 ] 

Kihwal Lee commented on HDFS-6425:
--

Did you have a chance to analyze the cause of the large number of 
over-replication?  It might be due to the race between completeFile and 
incremental block reports. If a file is closed with just min_replicas and the 
replication monitor runs before all the rest of incremental block reports are 
received, replication will be scheduled and this will lead to over-replication. 

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, 
> HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246679#comment-14246679
 ] 

Hadoop QA commented on HDFS-6425:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661072/HDFS-6425-2.patch
  against trunk revision fae3e86.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9037//console

This message is automatically generated.

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, 
> HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-12-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246672#comment-14246672
 ] 

Kihwal Lee commented on HDFS-6425:
--

[~mingma] The patch looks good, but does not apply to trunk any more. Can you 
refresh it?

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, 
> HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093558#comment-14093558
 ] 

Hadoop QA commented on HDFS-6425:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661072/HDFS-6425-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7611//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7611//console

This message is automatically generated.

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-2.patch, HDFS-6425-Test-Case.pdf, 
> HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-07 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089967#comment-14089967
 ] 

Ming Ma commented on HDFS-6425:
---

Thanks, Arpit. This jira can address more common NN failover scenario with lots 
of "content stale" storages.

We try to get storages out of "content stale"  as soon as possible. Here are 
several scenarios.

a. For non-HA NN restart, have DN send HB before BR right after registration.
b. For HA setup, NN becomes active right after it restarts. This can happen if 
we have to restart both NNs at the same time, due to some rare outage or some 
incompatible upgrade. In this case, the active NN will first go to standby, 
then get transitioned to active at which point all DNs will be marked as stale 
again. For big clusters, most of the DN reregistration will come in after the 
NN becomes active, so the fix to have DNs send HB and BR right after 
registration will also help.
c. For HA setup, NN becomes active after the NN JVM has been up for some time. 
The failover could happen due to zk session timeout, or the other NN just 
crashes. In this case, there is no DN reregistration given the new active NN 
doesn't have recent restart. We can change the NN to ask DN to resend 
blockreport upon failover, but that will cause cluster performance issue.

So we still have some scenario where we might have lots of "content stale" 
storages. This jira tries to make NN handle the scenario better.


> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089653#comment-14089653
 ] 

Arpit Agarwal commented on HDFS-6425:
-

Hi Ming, is this problem mitigated by your fix for HDFS-6772?

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085630#comment-14085630
 ] 

Hadoop QA commented on HDFS-6425:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659743/HDFS-6425.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7560//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7560//console

This message is automatically generated.

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085278#comment-14085278
 ] 

Hadoop QA commented on HDFS-6425:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12659706/HDFS-6425-Test-Case.pdf
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7559//console

This message is automatically generated.

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-01 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083131#comment-14083131
 ] 

Ming Ma commented on HDFS-6425:
---

PostponedMisreplicatedBlocks > 1M can cause BR latency to spike to couple 
seconds. 

1. Given all the DNs are marked as blockContentsStale after NN fail overs, 
rescanPostponedMisreplicatedBlocks isn't going to find many blocks to remove 
until majority of DNs send their blockreports.
2. Normally it is ok to not remove stay over replicated right away. So 
rescanPostponedMisreplicatedBlocks can wait until most if not all of the DN 
storages aren't marked as blockContentsStale anymore.

Ideas on how to fix it:

Rescan postponed blocks only after all/most of DN storages aren't marked as 
blockContentsStale anymore. In that way, postponed blocks won't impact BR until 
most of DNs have sent BRs. After that, postponed blocks will be drained 
steadily. We can do it in a background thread instead of during BR call.

Alternatively, [~lohit] and I also discussed using HashMap to store postponed 
blocks, keyed by DN storage, that means each BR doesn't need to scan the whole 
set and thus improve the performance.

Suggestions?



> Large postponedMisreplicatedBlocks has impact on blockReport latency
> 
>
> Key: HDFS-6425
> URL: https://issues.apache.org/jira/browse/HDFS-6425
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)