[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode

2015-08-13 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HDFS-7980:
--
Attachment: hadoop-241.patch

> Incremental BlockReport will dramatically slow down the startup of  a namenode
> --
>
> Key: HDFS-7980
> URL: https://issues.apache.org/jira/browse/HDFS-7980
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hui Zheng
>Assignee: Walter Su
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, 
> HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch, 
> hadoop-241.patch
>
>
> In the current implementation the datanode will call the 
> reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before 
> calling the bpNamenode.blockReport() method. So in a large(several thousands 
> of datanodes) and busy cluster it will slow down(more than one hour) the 
> startup of namenode. 
> {code}
> List blockReport() throws IOException {
> // send block report if timer has expired.
> final long startTime = now();
> if (startTime - lastBlockReport <= dnConf.blockReportInterval) {
>   return null;
> }
> final ArrayList cmds = new ArrayList();
> // Flush any block information that precedes the block report. Otherwise
> // we have a chance that we will miss the delHint information
> // or we will report an RBW replica after the BlockReport already reports
> // a FINALIZED one.
> reportReceivedDeletedBlocks();
> lastDeletedReport = startTime;
> .
> // Send the reports to the NN.
> int numReportsSent = 0;
> int numRPCs = 0;
> boolean success = false;
> long brSendStartTime = now();
> try {
>   if (totalBlockCount < dnConf.blockReportSplitThreshold) {
> // Below split threshold, send all reports in a single message.
> DatanodeCommand cmd = bpNamenode.blockReport(
> bpRegistration, bpos.getBlockPoolId(), reports);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode

2015-08-13 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HDFS-7980:
--
Attachment: (was: hadoop-241.patch)

> Incremental BlockReport will dramatically slow down the startup of  a namenode
> --
>
> Key: HDFS-7980
> URL: https://issues.apache.org/jira/browse/HDFS-7980
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hui Zheng
>Assignee: Walter Su
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, 
> HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch
>
>
> In the current implementation the datanode will call the 
> reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before 
> calling the bpNamenode.blockReport() method. So in a large(several thousands 
> of datanodes) and busy cluster it will slow down(more than one hour) the 
> startup of namenode. 
> {code}
> List blockReport() throws IOException {
> // send block report if timer has expired.
> final long startTime = now();
> if (startTime - lastBlockReport <= dnConf.blockReportInterval) {
>   return null;
> }
> final ArrayList cmds = new ArrayList();
> // Flush any block information that precedes the block report. Otherwise
> // we have a chance that we will miss the delHint information
> // or we will report an RBW replica after the BlockReport already reports
> // a FINALIZED one.
> reportReceivedDeletedBlocks();
> lastDeletedReport = startTime;
> .
> // Send the reports to the NN.
> int numReportsSent = 0;
> int numRPCs = 0;
> boolean success = false;
> long brSendStartTime = now();
> try {
>   if (totalBlockCount < dnConf.blockReportSplitThreshold) {
> // Below split threshold, send all reports in a single message.
> DatanodeCommand cmd = bpNamenode.blockReport(
> bpRegistration, bpos.getBlockPoolId(), reports);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode

2015-08-13 Thread Zhihua Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696376#comment-14696376
 ] 

Zhihua Deng commented on HDFS-7980:
---

   
   Recently, we encountered the same problem in our cluster of version 2.4.1 
and created a 
patch(https://github.com/dengzhhu653/hdfs-2.4.1/blob/master/hadoop-241.patch) 
according to the patch attached. let the restarted NN process the first full 
report by the faster processFirstBlockReport method, and add an condition 
AddBlockResult.ADDED==result in addStoredBlockImmediate method when 
FSNameSystem tries to invoke incrementSafeBlockCount method.
   
   The problem is I am not so sure if there exists any potential issues of the 
patch when I apply it to our cluster , any advises and opinions will be greatly 
appreciated and taken seriously, thanks!

> Incremental BlockReport will dramatically slow down the startup of  a namenode
> --
>
> Key: HDFS-7980
> URL: https://issues.apache.org/jira/browse/HDFS-7980
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hui Zheng
>Assignee: Walter Su
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, 
> HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch
>
>
> In the current implementation the datanode will call the 
> reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before 
> calling the bpNamenode.blockReport() method. So in a large(several thousands 
> of datanodes) and busy cluster it will slow down(more than one hour) the 
> startup of namenode. 
> {code}
> List blockReport() throws IOException {
> // send block report if timer has expired.
> final long startTime = now();
> if (startTime - lastBlockReport <= dnConf.blockReportInterval) {
>   return null;
> }
> final ArrayList cmds = new ArrayList();
> // Flush any block information that precedes the block report. Otherwise
> // we have a chance that we will miss the delHint information
> // or we will report an RBW replica after the BlockReport already reports
> // a FINALIZED one.
> reportReceivedDeletedBlocks();
> lastDeletedReport = startTime;
> .
> // Send the reports to the NN.
> int numReportsSent = 0;
> int numRPCs = 0;
> boolean success = false;
> long brSendStartTime = now();
> try {
>   if (totalBlockCount < dnConf.blockReportSplitThreshold) {
> // Below split threshold, send all reports in a single message.
> DatanodeCommand cmd = bpNamenode.blockReport(
> bpRegistration, bpos.getBlockPoolId(), reports);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhihua Deng (JIRA)
Zhihua Deng created HDFS-8985:
-

 Summary: Restarted namenode suffer from block report storm
 Key: HDFS-8985
 URL: https://issues.apache.org/jira/browse/HDFS-8985
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhihua Deng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HDFS-8985:
--
Affects Version/s: 2.4.1

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HDFS-8985:
--
Labels: performance  (was: )

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>  Labels: performance
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HDFS-8985:
--
Issue Type: Test  (was: Improvement)

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>  Labels: performance
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HDFS-8985.
---
Resolution: Fixed

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>  Labels: performance
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HDFS-8985:
--
Priority: Trivial  (was: Major)

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>Priority: Trivial
>  Labels: performance
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhihua Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HDFS-8985:
--
Labels: test  (was: performance)

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>Priority: Trivial
>  Labels: test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-29 Thread Zhihua Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721057#comment-14721057
 ] 

Zhihua Deng commented on HDFS-8985:
---

yes, thank you.

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>Priority: Trivial
>  Labels: test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-29 Thread Zhihua Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721058#comment-14721058
 ] 

Zhihua Deng commented on HDFS-8985:
---

yes, thank you.

> Restarted namenode suffer from block report storm
> -
>
> Key: HDFS-8985
> URL: https://issues.apache.org/jira/browse/HDFS-8985
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.4.1
>Reporter: Zhihua Deng
>Priority: Trivial
>  Labels: test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)