[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2014-02-20 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907955#comment-13907955
 ] 

Vinayakumar B commented on HDFS-5496:
-

These failures are not there in the second patch's test report. 
I think since first patch was missing LightWeightGSet changes, so those are 
failed.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinayakumar B
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907404#comment-13907404
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5496:
--

Hi Vinay, there are some replication related tests failed in 
https://builds.apache.org/job/PreCommit-HDFS-Build/6189//testReport/ .  Could 
you take a look?

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinayakumar B
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-17 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850811#comment-13850811
 ] 

Jing Zhao commented on HDFS-5496:
-

I will commit this patch to HDFS-5535 branch late today or early tomorrow if 
there is no objection.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850195#comment-13850195
 ] 

Jing Zhao commented on HDFS-5496:
-

Thanks Vinay! The new patch looks good to me. [~kihwal], do you also want to 
take a look at the patch?

Since we already have the HDFS-5535 branch, I think we can first commit this 
patch into HDFS-5535 branch. Besides, we may also want to add some unit tests 
to check if we have successfully initialized the replication queue in different 
scenarios (HA/non-HA, NN failover, in/not in safemode, etc)? But it's also fine 
to me to add the unit test in a separate jira.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850091#comment-13850091
 ] 

Hadoop QA commented on HDFS-5496:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619013/HDFS-5496.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5737//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5737//console

This message is automatically generated.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849961#comment-13849961
 ] 

Vinay commented on HDFS-5496:
-

Thanks for the explanation Jing. I knew I missed something. Now came to know I 
missed SafeMode#leave.
I will arrange one more patch. 

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849670#comment-13849670
 ] 

Jing Zhao commented on HDFS-5496:
-

bq. So it can completely miss initialization of replication queues itself.
So here will SafeMode#leave finally call the initializeReplQueues?

As a summary, if we make the following change:
# in startActiveService(), change the condition of replication queue 
initialization to 
{code}
!isInSafeMode()
{code}
# in checkMode(), change the condition to 
{code}
canInitializeReplQueues() && !isPopulatingReplQueues() && !haEnabled
{code}
# and in SafeMode#leave, we still have
{code}
if (!isPopulatingReplQueues() && shouldPopulateReplQueues()) {
  initializeReplQueues();
}
{code}

Then in non-HA mode, we have:
# if the FSN enters safemode before starting active service, 
checkMode()/leave() will initialize the replication queue
# if the FSN does not enter safemode, startActiveService will initialize the 
replication queue

In HA mode, we have:
# checkMode will no longer initialize replication queue
# if the FSN enters safemode before starting active server, SafeMode#leave will 
call initializeReplQueues
# if FSN does not enter safemode, startActiveService will initialize 
replication queue

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849110#comment-13849110
 ] 

Hadoop QA commented on HDFS-5496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618878/HDFS-5496.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5730//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5730//console

This message is automatically generated.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848975#comment-13848975
 ] 

Hadoop QA commented on HDFS-5496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618876/HDFS-5496.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5729//console

This message is automatically generated.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848962#comment-13848962
 ] 

Vinay commented on HDFS-5496:
-

Also added a method to get the progress of initialization. Which can be used 
later to show in UI.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848939#comment-13848939
 ] 

Vinay commented on HDFS-5496:
-

{quote}How about changing this to
{code}if (!isInSafeMode() || ((isInSafeMode() && 
safeMode.isPopulatingReplQueues()) && haEnabled)){code}
I.e., in non-HA setup, maybe we do not need to restart the processing since the 
NN already loads all the editlog before entering safemode?
And in checkMode(), can we change
{code}  if (canInitializeReplQueues() && !isPopulatingReplQueues()) {
initializeReplQueues();
  }{code}
to
  {code}if (canInitializeReplQueues() && !isPopulatingReplQueues() && 
!haEnabled) {
initializeReplQueues();
  }{code}
because in HA setup we will run processMisReplicateBlocks in 
startActiveService.{quote}

I was thinking about this again. I still have some doubts.
Above change will avoid reprocessing in NonHA.
But in HA, if startActiveServices() is called before any safemode reaches 
threshold, then following check will fail and skip call to initialize queues.
{code}if (!isInSafeMode() || ((isInSafeMode() && 
safeMode.isPopulatingReplQueues()) && haEnabled)){code} 
And in safemode#checkMode() also initialization will be skipped because of 
haEnabled check.
{code}if (!isInSafeMode() || ((isInSafeMode() && 
safeMode.isPopulatingReplQueues()) && haEnabled)){code}
So it can completely miss initialization of replication queues itself. 

Am I missing something..?


> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847744#comment-13847744
 ] 

Jing Zhao commented on HDFS-5496:
-

Thanks for the further explanation, Vinay. Now I see what you mean.
{code}
blockManager.clearQueues();
blockManager.processAllPendingDNMessages();
{code}
In my previous comment, I thought you also planned to remove the clearQueues() 
call.. 

So if we clear all the queues, and continue with the ongoing initialization, 
the blocks that were processed before the startActiveService call will not be 
re-processed? I.e., we finally will generate the replication queues for only 
part of the blocks.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847063#comment-13847063
 ] 

Vinay commented on HDFS-5496:
-

bq. So for that part of the change, can we make sure that 
markAllDatanodesStale() is called before calling processMisReplicateBlocks? 
Currently in {{startActiveServices()}},  markAllDatanodesStale() is called 
before processMisReplicateBlocks(). Also queues are cleared unconditionally at 
this time which destroys the result of ongoing initialization.
{code}blockManager.setPostponeBlocksFromFuture(false);
blockManager.getDatanodeManager().markAllDatanodesStale();
blockManager.clearQueues();
blockManager.processAllPendingDNMessages();{code}
So that means if we continue with the ongoing initialization, all further 
blocks will be added to postponed list only. right?
Something we are missing here?

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847055#comment-13847055
 ] 

Jing Zhao commented on HDFS-5496:
-

So for that part of the change, can we make sure that markAllDatanodesStale() 
is called before calling processMisReplicateBlocks? Because if we have not 
marked all DN as stale, "num.replicasOnStaleNodes() > 0" may not be true? And 
this can be the case when we call initializeReplQueues() while in safemode and 
before calling startActiveService()?

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847049#comment-13847049
 ] 

Vinay commented on HDFS-5496:
-

I think, block processing will  be postponed, not put into invalid queue. 
Because of the below change in the patch
{code}+// postpone making any decision with stale replicas
+if (numCurrentReplica > expectedReplication
+&& num.replicasOnStaleNodes() > 0) {
+  // If any of the replicas of this block are on nodes that are
+  // considered "stale", then these replicas may in fact have
+  // already been deleted. So, we cannot safely act on the
+  // over-replication until a later point in time, when
+  // the "stale" nodes have block reported.
+  return MisReplicationResult.POSTPONE;
+}{code}

right?

In that case, better to re-initialize?

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847044#comment-13847044
 ] 

Jing Zhao commented on HDFS-5496:
-

In HA setup, at this time, because markAllDatanodesStale() is only called in 
startActiveService(), thus the initializeReplQueues(), which is called before 
marking datanodes as stale, can wrongly put a block into the invalidate queue 
according to HDFS-1972 I think. 

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847026#comment-13847026
 ] 

Vinay commented on HDFS-5496:
-

Hi Jing,
I think that would work.
With that change, multiple initializations will not happen in Non-HA mode.
In HA setup, if the safemode is in extenstion after calling 
{{initializeReplQueues()}} by the time of {{startActiveServices()}}, 
re-initialization will be called.

Now my only question is, at this time, do we need to restart initialization or 
continue if any ongoing initialization..?
I feel its ok to continue. Any thoughts..?

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846862#comment-13846862
 ] 

Jing Zhao commented on HDFS-5496:
-

bq. if (!isInSafeMode() && haEnabled) will fail to detect the need to restart 
the initialization.

How about changing this to 
{code}
if (!isInSafeMode() || ((isInSafeMode() && safeMode.isPopulatingReplQueues()) 
&& haEnabled))
{code}
I.e., in non-HA setup, maybe we do not need to restart the processing since the 
NN already loads all the editlog before entering safemode?

And in checkMode(), can we change 
{code}
  if (canInitializeReplQueues() && !isPopulatingReplQueues()) {
initializeReplQueues();
  }
{code}
to
{code}
  if (canInitializeReplQueues() && !isPopulatingReplQueues() && !haEnabled) 
{
initializeReplQueues();
  }
{code}
because in HA setup we will run processMisReplicateBlocks in startActiveService.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846783#comment-13846783
 ] 

Kihwal Lee commented on HDFS-5496:
--

I just want to provide a data point regarding the number of blocks to 
process/iteration. According to a measurement against a name node with a big 
name space, the initialization throughput was a little over 300K blocks/second. 
On this machine, the default limit of 10K will be equivalent to about 3.3ms of 
write lock duration.  There is a pressure to scan all blocks as soon as 
possible to avoid data loss by delayed replication, so a higher limit may be 
preferred in some situations. But I think it is okay as a default value.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846751#comment-13846751
 ] 

Kihwal Lee commented on HDFS-5496:
--

> If I understand correctly, Need to restart initializing queues right?

Yes.   Since the initialization is now asynchronous, a new kind of problem can 
occur.  A standby node in safe mode can transition to active, checkMode() 
causes the initialization to start (i.e. entering safe mode extension). At this 
point, if it transitions back to standby and again to active while in the safe 
mode extension period, {{if (!isInSafeMode() && haEnabled)}} will fail to 
detect the need to restart the initialization.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846178#comment-13846178
 ] 

Vinay commented on HDFS-5496:
-

bq. The following change would have been fine if leaving safe mode and 
initializing replication queues were synchronized. It appears checkMode() can 
start a background initialization before leaving the safe mode. Since the 
queues are unconditionally cleared right before the following, an on-going 
initialization should be stopped and redone.
If I understand correctly, Need to restart initializing queues right?

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845580#comment-13845580
 ] 

Kihwal Lee commented on HDFS-5496:
--

The following change would have been fine if leaving safe mode and initializing 
replication queues were synchronized.  It appears {{checkMode()}} can start a 
background initialization before leaving the safe mode. Since the queues are 
unconditionally cleared right before the following, an on-going initialization 
should be stopped and redone.

{code}
-if (!isInSafeMode() ||
-(isInSafeMode() && safeMode.isPopulatingReplQueues())) {
+// We only need to reprocess the queue in HA mode and not in safemode
+if (!isInSafeMode() && haEnabled) {
{code}

There have been discussions regarding removing safe mode extension and perhaps 
safe mode monitor. That will make the check/logic simpler.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinay
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845529#comment-13845529
 ] 

Kihwal Lee commented on HDFS-5496:
--

It will be nice if the web UI says something if the replication queues are 
being initialized. Showing its progress will be a plus.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845150#comment-13845150
 ] 

Hadoop QA commented on HDFS-5496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618178/HDFS-5496.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5693//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5693//console

This message is automatically generated.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845113#comment-13845113
 ] 

Jing Zhao commented on HDFS-5496:
-

bq. means just return the call if already initialization is in progress?

Yes, I think so if my two assumptions stand..

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
> Attachments: HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-10 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845046#comment-13845046
 ] 

Vinay commented on HDFS-5496:
-

bq. For (4), looks like currently we only retrieve metrics information from 
postponedMisreplicatedBlocks and we always check if the corresponding DNs are 
still stale before we make INVALIDATE decision. Thus it should be safe if we 
delay its initialization. 
For this I am trying make some changes in the patch. Hope next patch will 
include this.
bq. For (2), currently we add under-replicated blocks into neededReplications 
when 1) initially populating the replication queue, 2) checking replication 
when finalizing an under-construction file, 3) checking replication progress 
for decommissioning DN, and 4) pending replicas timeout. Delaying 1) and making 
it happen in parallel with 2)~4) should also be safe.
I guess this already in place. i.e. UnderReplicated Blocks are not added to 
neededReplications in {{processMisReplicatedBlock(..)}}.
{code}if (!block.isComplete()) {
  // Incomplete blocks are never considered mis-replicated --
  // they'll be reached when they are completed or recovered.
  return MisReplicationResult.UNDER_CONSTRUCTION;
}{code}
bq. For the current patch, I understand we need a new iterator that can iterate 
the blocksMap and not throw exception when concurrent modifications happen. 
However, I guess we may only need to define a new iterator and do not need to 
define the new BlocksMapGSet here. Also, since the new iterator shares most of 
the code with the existing LightWeightGSet#SetIterator, maybe we can simply 
extend SetIterator here?
Yes. Sure. 
bq. So for case 3, in non-HA setup, I think maybe we do not need to restart the 
processing since there should not be any pending editlog for NN to process in 
startActiveService? In HA setup, since we can always run 
processMisReplicateBlocks in startActiveService, we actually do not need to 
populate replication queue while still in safemode? If we're able to make these 
two changes, for the current patch, we do not need to worry about some 
already-running replication initializing thread.
This can be done. " do not need to worry about  already-running replication 
initializing " means just return the call if already initialization is in 
progress?


> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
> Attachments: HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844737#comment-13844737
 ] 

Jing Zhao commented on HDFS-5496:
-

Another question is that, currently we call processMisReplicateBlocks when 1) 
starting active service, 2) leaving safemode, and 3) before leaving safemode if 
blockReplQueueThreshold is met. Specifically, with or without HA setup, we call 
processMisReplicateBlocks in the following cases:

# NN is not in safemode
# NN is in safemode, but we have not populated replication queue yet
# NN is in safemode, and we have already started populating the replication 
queue. We will restart the processing here.

So for case 3, in non-HA setup, I think maybe we do not need to restart the 
processing since there should not be any pending editlog for NN to process in 
startActiveService? In HA setup, since we can always run 
processMisReplicateBlocks in startActiveService, we actually do not need to 
populate replication queue while still in safemode? If we're able to make these 
two changes, for the current patch, we do not need to worry about some 
already-running replication initializing thread.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
> Attachments: HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844607#comment-13844607
 ] 

Jing Zhao commented on HDFS-5496:
-

Hi [~vinayrpet], your proposed solution looks good to me. Currently the 
processMisReplicatedBlock method updates 4 different data structures:(1) the 
invalidateBlocks storing blocks that does not belong to any file, and (2) 
neededReplications storing blocks that need to be replicated, and (3) 
excessReplicateMap tracking over replicated blocks (and these blocks are added 
into invalidateBlocks too), and (4) postponedMisreplicatedBlocks storing blocks 
that seem like over-replicated but we still need to wait for the deletion 
report from the corresponding DNs.

For (4), looks like currently we only retrieve metrics information from 
postponedMisreplicatedBlocks and we always check if the corresponding DNs are 
still stale before we make INVALIDATE decision. Thus it should be safe if we 
delay its initialization. For (2), currently we add under-replicated blocks 
into neededReplications when 1) initially populating the replication queue, 2) 
checking replication when finalizing an under-construction file, 3) checking 
replication progress for decommissioning DN, and 4) pending replicas timeout. 
Delaying 1) and making it happen in parallel with 2)~4) should also be safe.

For the current patch, I understand we need a new iterator that can iterate the 
blocksMap and not throw exception when concurrent modifications happen. 
However, I guess we may only need to define a new iterator and do not need to 
define the new BlocksMapGSet here. Also, since the new iterator shares most of 
the code with the existing LightWeightGSet#SetIterator, maybe we can simply 
extend SetIterator here?


> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
> Attachments: HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832346#comment-13832346
 ] 

Hadoop QA commented on HDFS-5496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12615774/HDFS-5496.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5570//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5570//console

This message is automatically generated.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
> Attachments: HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-11-11 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819258#comment-13819258
 ] 

Aaron T. Myers commented on HDFS-5496:
--

+1, we should definitely do this. In the very worst case I've seen the time 
taken to process the blocks map exceed the 45s timeout of the ZKFC for the NN 
to become active, which can cause the NNs to flap back and forth between the 
active and standby state ad infinitum until one is manually killed. In the past 
I've upped this timeout to work around this issue, but really we should get the 
blocks map/replication queue processing out of the critical path of the NN 
transitioning to the active state.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Kihwal Lee
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-11-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819247#comment-13819247
 ] 

Kihwal Lee commented on HDFS-5496:
--

With 125M blocks in BlocksMap, checking all blocks without adding anything to 
the under-replicated blocks queue took about 28 seconds. If there is a large 
number of under-replicated blocks, the initialization time can increase 
significantly. There is also an issue with the under-replicated blocks queue 
that make it worse.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Kihwal Lee
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)