[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907955#comment-13907955 ] Vinayakumar B commented on HDFS-5496: - These failures are not there in the second patch's test report. I think since first patch was missing LightWeightGSet changes, so those are failed. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinayakumar B > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907404#comment-13907404 ] Tsz Wo (Nicholas), SZE commented on HDFS-5496: -- Hi Vinay, there are some replication related tests failed in https://builds.apache.org/job/PreCommit-HDFS-Build/6189//testReport/ . Could you take a look? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinayakumar B > Fix For: HDFS-5535 (Rolling upgrades) > > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850811#comment-13850811 ] Jing Zhao commented on HDFS-5496: - I will commit this patch to HDFS-5535 branch late today or early tomorrow if there is no objection. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850195#comment-13850195 ] Jing Zhao commented on HDFS-5496: - Thanks Vinay! The new patch looks good to me. [~kihwal], do you also want to take a look at the patch? Since we already have the HDFS-5535 branch, I think we can first commit this patch into HDFS-5535 branch. Besides, we may also want to add some unit tests to check if we have successfully initialized the replication queue in different scenarios (HA/non-HA, NN failover, in/not in safemode, etc)? But it's also fine to me to add the unit test in a separate jira. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850091#comment-13850091 ] Hadoop QA commented on HDFS-5496: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619013/HDFS-5496.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5737//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5737//console This message is automatically generated. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849961#comment-13849961 ] Vinay commented on HDFS-5496: - Thanks for the explanation Jing. I knew I missed something. Now came to know I missed SafeMode#leave. I will arrange one more patch. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849670#comment-13849670 ] Jing Zhao commented on HDFS-5496: - bq. So it can completely miss initialization of replication queues itself. So here will SafeMode#leave finally call the initializeReplQueues? As a summary, if we make the following change: # in startActiveService(), change the condition of replication queue initialization to {code} !isInSafeMode() {code} # in checkMode(), change the condition to {code} canInitializeReplQueues() && !isPopulatingReplQueues() && !haEnabled {code} # and in SafeMode#leave, we still have {code} if (!isPopulatingReplQueues() && shouldPopulateReplQueues()) { initializeReplQueues(); } {code} Then in non-HA mode, we have: # if the FSN enters safemode before starting active service, checkMode()/leave() will initialize the replication queue # if the FSN does not enter safemode, startActiveService will initialize the replication queue In HA mode, we have: # checkMode will no longer initialize replication queue # if the FSN enters safemode before starting active server, SafeMode#leave will call initializeReplQueues # if FSN does not enter safemode, startActiveService will initialize replication queue > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849110#comment-13849110 ] Hadoop QA commented on HDFS-5496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618878/HDFS-5496.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5730//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5730//console This message is automatically generated. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, > HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848975#comment-13848975 ] Hadoop QA commented on HDFS-5496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618876/HDFS-5496.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5729//console This message is automatically generated. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848962#comment-13848962 ] Vinay commented on HDFS-5496: - Also added a method to get the progress of initialization. Which can be used later to show in UI. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848939#comment-13848939 ] Vinay commented on HDFS-5496: - {quote}How about changing this to {code}if (!isInSafeMode() || ((isInSafeMode() && safeMode.isPopulatingReplQueues()) && haEnabled)){code} I.e., in non-HA setup, maybe we do not need to restart the processing since the NN already loads all the editlog before entering safemode? And in checkMode(), can we change {code} if (canInitializeReplQueues() && !isPopulatingReplQueues()) { initializeReplQueues(); }{code} to {code}if (canInitializeReplQueues() && !isPopulatingReplQueues() && !haEnabled) { initializeReplQueues(); }{code} because in HA setup we will run processMisReplicateBlocks in startActiveService.{quote} I was thinking about this again. I still have some doubts. Above change will avoid reprocessing in NonHA. But in HA, if startActiveServices() is called before any safemode reaches threshold, then following check will fail and skip call to initialize queues. {code}if (!isInSafeMode() || ((isInSafeMode() && safeMode.isPopulatingReplQueues()) && haEnabled)){code} And in safemode#checkMode() also initialization will be skipped because of haEnabled check. {code}if (!isInSafeMode() || ((isInSafeMode() && safeMode.isPopulatingReplQueues()) && haEnabled)){code} So it can completely miss initialization of replication queues itself. Am I missing something..? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847744#comment-13847744 ] Jing Zhao commented on HDFS-5496: - Thanks for the further explanation, Vinay. Now I see what you mean. {code} blockManager.clearQueues(); blockManager.processAllPendingDNMessages(); {code} In my previous comment, I thought you also planned to remove the clearQueues() call.. So if we clear all the queues, and continue with the ongoing initialization, the blocks that were processed before the startActiveService call will not be re-processed? I.e., we finally will generate the replication queues for only part of the blocks. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847063#comment-13847063 ] Vinay commented on HDFS-5496: - bq. So for that part of the change, can we make sure that markAllDatanodesStale() is called before calling processMisReplicateBlocks? Currently in {{startActiveServices()}}, markAllDatanodesStale() is called before processMisReplicateBlocks(). Also queues are cleared unconditionally at this time which destroys the result of ongoing initialization. {code}blockManager.setPostponeBlocksFromFuture(false); blockManager.getDatanodeManager().markAllDatanodesStale(); blockManager.clearQueues(); blockManager.processAllPendingDNMessages();{code} So that means if we continue with the ongoing initialization, all further blocks will be added to postponed list only. right? Something we are missing here? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847055#comment-13847055 ] Jing Zhao commented on HDFS-5496: - So for that part of the change, can we make sure that markAllDatanodesStale() is called before calling processMisReplicateBlocks? Because if we have not marked all DN as stale, "num.replicasOnStaleNodes() > 0" may not be true? And this can be the case when we call initializeReplQueues() while in safemode and before calling startActiveService()? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847049#comment-13847049 ] Vinay commented on HDFS-5496: - I think, block processing will be postponed, not put into invalid queue. Because of the below change in the patch {code}+// postpone making any decision with stale replicas +if (numCurrentReplica > expectedReplication +&& num.replicasOnStaleNodes() > 0) { + // If any of the replicas of this block are on nodes that are + // considered "stale", then these replicas may in fact have + // already been deleted. So, we cannot safely act on the + // over-replication until a later point in time, when + // the "stale" nodes have block reported. + return MisReplicationResult.POSTPONE; +}{code} right? In that case, better to re-initialize? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847044#comment-13847044 ] Jing Zhao commented on HDFS-5496: - In HA setup, at this time, because markAllDatanodesStale() is only called in startActiveService(), thus the initializeReplQueues(), which is called before marking datanodes as stale, can wrongly put a block into the invalidate queue according to HDFS-1972 I think. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847026#comment-13847026 ] Vinay commented on HDFS-5496: - Hi Jing, I think that would work. With that change, multiple initializations will not happen in Non-HA mode. In HA setup, if the safemode is in extenstion after calling {{initializeReplQueues()}} by the time of {{startActiveServices()}}, re-initialization will be called. Now my only question is, at this time, do we need to restart initialization or continue if any ongoing initialization..? I feel its ok to continue. Any thoughts..? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846862#comment-13846862 ] Jing Zhao commented on HDFS-5496: - bq. if (!isInSafeMode() && haEnabled) will fail to detect the need to restart the initialization. How about changing this to {code} if (!isInSafeMode() || ((isInSafeMode() && safeMode.isPopulatingReplQueues()) && haEnabled)) {code} I.e., in non-HA setup, maybe we do not need to restart the processing since the NN already loads all the editlog before entering safemode? And in checkMode(), can we change {code} if (canInitializeReplQueues() && !isPopulatingReplQueues()) { initializeReplQueues(); } {code} to {code} if (canInitializeReplQueues() && !isPopulatingReplQueues() && !haEnabled) { initializeReplQueues(); } {code} because in HA setup we will run processMisReplicateBlocks in startActiveService. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846783#comment-13846783 ] Kihwal Lee commented on HDFS-5496: -- I just want to provide a data point regarding the number of blocks to process/iteration. According to a measurement against a name node with a big name space, the initialization throughput was a little over 300K blocks/second. On this machine, the default limit of 10K will be equivalent to about 3.3ms of write lock duration. There is a pressure to scan all blocks as soon as possible to avoid data loss by delayed replication, so a higher limit may be preferred in some situations. But I think it is okay as a default value. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846751#comment-13846751 ] Kihwal Lee commented on HDFS-5496: -- > If I understand correctly, Need to restart initializing queues right? Yes. Since the initialization is now asynchronous, a new kind of problem can occur. A standby node in safe mode can transition to active, checkMode() causes the initialization to start (i.e. entering safe mode extension). At this point, if it transitions back to standby and again to active while in the safe mode extension period, {{if (!isInSafeMode() && haEnabled)}} will fail to detect the need to restart the initialization. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846178#comment-13846178 ] Vinay commented on HDFS-5496: - bq. The following change would have been fine if leaving safe mode and initializing replication queues were synchronized. It appears checkMode() can start a background initialization before leaving the safe mode. Since the queues are unconditionally cleared right before the following, an on-going initialization should be stopped and redone. If I understand correctly, Need to restart initializing queues right? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845580#comment-13845580 ] Kihwal Lee commented on HDFS-5496: -- The following change would have been fine if leaving safe mode and initializing replication queues were synchronized. It appears {{checkMode()}} can start a background initialization before leaving the safe mode. Since the queues are unconditionally cleared right before the following, an on-going initialization should be stopped and redone. {code} -if (!isInSafeMode() || -(isInSafeMode() && safeMode.isPopulatingReplQueues())) { +// We only need to reprocess the queue in HA mode and not in safemode +if (!isInSafeMode() && haEnabled) { {code} There have been discussions regarding removing safe mode extension and perhaps safe mode monitor. That will make the check/logic simpler. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee >Assignee: Vinay > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845529#comment-13845529 ] Kihwal Lee commented on HDFS-5496: -- It will be nice if the web UI says something if the replication queues are being initialized. Showing its progress will be a plus. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845150#comment-13845150 ] Hadoop QA commented on HDFS-5496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618178/HDFS-5496.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5693//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5693//console This message is automatically generated. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845113#comment-13845113 ] Jing Zhao commented on HDFS-5496: - bq. means just return the call if already initialization is in progress? Yes, I think so if my two assumptions stand.. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee > Attachments: HDFS-5496.patch, HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845046#comment-13845046 ] Vinay commented on HDFS-5496: - bq. For (4), looks like currently we only retrieve metrics information from postponedMisreplicatedBlocks and we always check if the corresponding DNs are still stale before we make INVALIDATE decision. Thus it should be safe if we delay its initialization. For this I am trying make some changes in the patch. Hope next patch will include this. bq. For (2), currently we add under-replicated blocks into neededReplications when 1) initially populating the replication queue, 2) checking replication when finalizing an under-construction file, 3) checking replication progress for decommissioning DN, and 4) pending replicas timeout. Delaying 1) and making it happen in parallel with 2)~4) should also be safe. I guess this already in place. i.e. UnderReplicated Blocks are not added to neededReplications in {{processMisReplicatedBlock(..)}}. {code}if (!block.isComplete()) { // Incomplete blocks are never considered mis-replicated -- // they'll be reached when they are completed or recovered. return MisReplicationResult.UNDER_CONSTRUCTION; }{code} bq. For the current patch, I understand we need a new iterator that can iterate the blocksMap and not throw exception when concurrent modifications happen. However, I guess we may only need to define a new iterator and do not need to define the new BlocksMapGSet here. Also, since the new iterator shares most of the code with the existing LightWeightGSet#SetIterator, maybe we can simply extend SetIterator here? Yes. Sure. bq. So for case 3, in non-HA setup, I think maybe we do not need to restart the processing since there should not be any pending editlog for NN to process in startActiveService? In HA setup, since we can always run processMisReplicateBlocks in startActiveService, we actually do not need to populate replication queue while still in safemode? If we're able to make these two changes, for the current patch, we do not need to worry about some already-running replication initializing thread. This can be done. " do not need to worry about already-running replication initializing " means just return the call if already initialization is in progress? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee > Attachments: HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844737#comment-13844737 ] Jing Zhao commented on HDFS-5496: - Another question is that, currently we call processMisReplicateBlocks when 1) starting active service, 2) leaving safemode, and 3) before leaving safemode if blockReplQueueThreshold is met. Specifically, with or without HA setup, we call processMisReplicateBlocks in the following cases: # NN is not in safemode # NN is in safemode, but we have not populated replication queue yet # NN is in safemode, and we have already started populating the replication queue. We will restart the processing here. So for case 3, in non-HA setup, I think maybe we do not need to restart the processing since there should not be any pending editlog for NN to process in startActiveService? In HA setup, since we can always run processMisReplicateBlocks in startActiveService, we actually do not need to populate replication queue while still in safemode? If we're able to make these two changes, for the current patch, we do not need to worry about some already-running replication initializing thread. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee > Attachments: HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844607#comment-13844607 ] Jing Zhao commented on HDFS-5496: - Hi [~vinayrpet], your proposed solution looks good to me. Currently the processMisReplicatedBlock method updates 4 different data structures:(1) the invalidateBlocks storing blocks that does not belong to any file, and (2) neededReplications storing blocks that need to be replicated, and (3) excessReplicateMap tracking over replicated blocks (and these blocks are added into invalidateBlocks too), and (4) postponedMisreplicatedBlocks storing blocks that seem like over-replicated but we still need to wait for the deletion report from the corresponding DNs. For (4), looks like currently we only retrieve metrics information from postponedMisreplicatedBlocks and we always check if the corresponding DNs are still stale before we make INVALIDATE decision. Thus it should be safe if we delay its initialization. For (2), currently we add under-replicated blocks into neededReplications when 1) initially populating the replication queue, 2) checking replication when finalizing an under-construction file, 3) checking replication progress for decommissioning DN, and 4) pending replicas timeout. Delaying 1) and making it happen in parallel with 2)~4) should also be safe. For the current patch, I understand we need a new iterator that can iterate the blocksMap and not throw exception when concurrent modifications happen. However, I guess we may only need to define a new iterator and do not need to define the new BlocksMapGSet here. Also, since the new iterator shares most of the code with the existing LightWeightGSet#SetIterator, maybe we can simply extend SetIterator here? > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee > Attachments: HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832346#comment-13832346 ] Hadoop QA commented on HDFS-5496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615774/HDFS-5496.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5570//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5570//console This message is automatically generated. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Kihwal Lee > Attachments: HDFS-5496.patch > > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819258#comment-13819258 ] Aaron T. Myers commented on HDFS-5496: -- +1, we should definitely do this. In the very worst case I've seen the time taken to process the blocks map exceed the 45s timeout of the ZKFC for the NN to become active, which can cause the NNs to flap back and forth between the active and standby state ad infinitum until one is manually killed. In the past I've upped this timeout to work around this issue, but really we should get the blocks map/replication queue processing out of the critical path of the NN transitioning to the active state. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Kihwal Lee > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819247#comment-13819247 ] Kihwal Lee commented on HDFS-5496: -- With 125M blocks in BlocksMap, checking all blocks without adding anything to the under-replicated blocks queue took about 28 seconds. If there is a large number of under-replicated blocks, the initialization time can increase significantly. There is also an issue with the under-replicated blocks queue that make it worse. > Make replication queue initialization asynchronous > -- > > Key: HDFS-5496 > URL: https://issues.apache.org/jira/browse/HDFS-5496 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Kihwal Lee > > Today, initialization of replication queues blocks safe mode exit and certain > HA state transitions. For a big name space, this can take hundreds of seconds > with the FSNamesystem write lock held. During this time, important requests > (e.g. initial block reports, heartbeat, etc) are blocked. > The effect of delaying the initialization would be not starting replication > right away, but I think the benefit outweighs. If we make it asynchronous, > the work per iteration should be limited, so that the lock duration is > capped. > If full/incremental block reports and any other requests that modifies block > state properly performs replication checks while the blocks are scanned and > the queues populated in background, every block will be processed. (Some may > be done twice) The replication monitor should run even before all blocks are > processed. > This will allow namenode to exit safe mode and start serving immediately even > with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1#6144)