[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9129:

Fix Version/s: (was: 3.0.0)

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9129:

Fix Version/s: 2.9.0

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9129:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-02 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Status: Patch Available  (was: Reopened)

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-01 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9129:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

I've committed this into trunk. Thanks [~liuml07] for the contribution! Thanks 
[~wheat9] and [~daryn] for the discussion and review.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-01 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129-branch-2.025.patch

Thank you very much [~jingzhao] for your review, discussion and commit. I 
upload the patch for {{branch-2}}.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-30 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.025.patch

Thanks [~jingzhao] for the review. I revisited the TODO comment and I also 
think we can remove it safely. Nice catch. The v25 patch is to address this.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-30 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Component/s: namenode

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.024.patch

Per offline discussion with [~jingzhao], the v24 patch removes the 
{{shouldIncrementallyTrackBlocks}} as it's not needed, and initializes the 
{{startTime}} in {{activate}}.

After this change, all previous comments should have been addressed.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch, HDFS-9129.024.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-11 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.023.patch

The v23 patch is to address the findbugs warnings.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-10 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.022.patch

Thanks [~jingzhao] for your detailed review and I appreciate it very much.

The v22 patch is to address all the comments with the following open discussion:
{quote}
2. The following two initializations may be wrong: with the patch the safemode 
object is created when contructing BlockManager, before loading fsimage and 
editlog from disk.
{code}
private final long startTime = monotonicNow();
private long lastStatusReport = monotonicNow();
{code}
{quote}
In v22 patch, the {{lastStatusReport}} is initialized in {{activate()}} method 
when the status was firstly reported.
The {{startTime}} was the namesystem start time in the {{trunk}} code. As its 
only usage is to calculate how long it takes to leave safemode for status 
report, perhaps we don't need accurate value? An alternative approach is to 
initialize the value in {{activate()}} method.


{quote}
3. {{shouldIncrementallyTrackBlocks}} is actually determined by {{haEnabled}} 
thus looks like it can be declared as final and {{isSafeModeTrackingBlocks}} 
can be simplified.
{quote}
Yes it merely depends on {{haEnabled}} when the {{setBlockTotal()}} is firstly 
called by {{activate()}}. However it's hard to make it final as it's not 
initialized in {{BlockManagerSafeMode}} constructor. In any case, the 
{{isSafeModeTrackingBlocks}} is simplified for sure.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-04 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.021.patch

The failing tests can pass locally. The v21 patch is to address the findbugs 
warnings.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-03 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.020.patch

Per offline discussion with [~jingzhao], the v19 patch has a fatal bug for 
initializing the replication queue. The v20 patch fixes this bug and revisits 
the synchronized behavior.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-03 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: (was: HDFS-9129.020.patch)

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-03 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.020.patch

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-02 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.019.patch

The v19 patch is the latest effort of simplifying the {{BlockManagerSafeMode}} 
status. The {{INITIALIZED}} enum is considered making the safe mode complicated 
and thus removed.

Let's see Jenkins report for the functional test. Will revisit the synchronized 
behavior.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-02 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.018.patch

The v18 patch fixes the flaky {{TestBlockManagerSafeMode}} unit test.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-29 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.016.patch

The failing tests can pass on my local machine. The v16 patch is to address the 
findbugs warnings.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-29 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.017.patch

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.012.patch

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.015.patch

The v15 patch fixes another conflict with [HDFS-4015]. Per offline discussion 
with [~arpitagarwal] and [~anu], the {{smmthread}} should not repeatedly 
reporting that:
{quote}
Refusing to leave safe mode without a force flag. Exiting safe mode will cause 
a deletion of 590683116 byte(s). Please use -forceExit flag to exit safe mode 
forcefully if data loss is acceptable.
{quote}

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.013.patch

The v13 patch revisits the synchronized methods in {{BlockManagerSafeMode}}

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.014.patch

The v14 patch resolves the conflicts with [HDFS-4015]. Thanks to [~anu] for 
kindly pointing out and help me reproduce the bug. A new unit test in 
{{TestBlockManagerSafeMode}} is added as well.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.011.patch

{quote}
{code}
private volatile boolean isInManualSafeMode = false;
private volatile boolean isInResourceLowSafeMode = false;

...
isInManualSafeMode = !resourcesLow;
isInResourceLowSafeMode = resourcesLow;
{code}
How do these two variables synchronize? Is the system in consistent state in 
the middle of the execution?
{quote}

Per offline discussion with [~wheat9], the {{volatile}} keyword is considered 
premature optimization. Make {{isInResourceLowSafeMode }} short-circuit 
{{isInManualSafeMode}} is bad design for incoming changes. The v11 patch uses 
synchronized block to make the system stay in consistent state in the middle of 
the execution.

The comment for {{BlockManagerSafeMode}} (ahead of the class) is also refined.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.010.patch

Thank you [~wheat9] for your review. The v10 patch addresses the comments, 
along with rebasing from {{trunk}} branch. As there are conflicts because of 
[HDFS-4015], I also invited [~arpitagarwal] and [~anu] to review the patch.

See response inline to [~wheat9]'s comments.
{quote}
1. It does not give much information compared to figuring out the issues on the 
code directly. What does "thresholds met" / "extensions reached" mean? It 
causes more confusions than explanations.
{quote}
The motivation is that diagram is helpful for glimpse, if we can provide 
definition of "thresholds met" / "extension reached". In v10 patch, I add more 
explanations in the comments.

{quote}
{code}
LOG.error("Non-recognized block manager safe mode status: {}", status);
{code}
2. Should be an assert.
{quote}
Truely, I'll simply {{assert false : "some comment"}}.

{quote}
{code}
private volatile boolean isInManualSafeMode = false;
private volatile boolean isInResourceLowSafeMode = false;

...
isInManualSafeMode = !resourcesLow;
isInResourceLowSafeMode = resourcesLow;
{code}
3. How do these two variables synchronize? Is the system in consistent state in 
the middle of the execution?
{quote}
Good question. Actually it's not in consistent state in the middle of the 
execution. If the {{resourceLow}} is true, and before that name node is in 
manual safe mode, in the middle of the execution {{isInSafeMode}} will return 
false, which means the safe mode is OFF. The main reason is that writing to the 
two variables (aka enter/leave safemode) is guarded by the FSNamesystem write 
lock, while read is not.

The enum type state was replace with two boolean flags in the v7 patch, because 
the two-lay state machine was cumbersome per offline discussion with [~wheat9] 
and [~jingzhao]. Guarding all the read looks expensive. Bitwise operation on a 
flag variable seems tricky.

The new design goes back to the {{trunk}} logic, which makes the block manager 
stays in safe mode in the middle of the execution:
{code}
  if (resourcesLow) {
isInResourceLowSafeMode = true;
  } else {
isInManualSafeMode = true;
  }
{code}
In case both {{isInManualSafeMode}} and {{isInResourceLowSafeMode}} are true, 
{{isInResourceLowSafeMode}} will short-circuit {{isInManualSafeMode}}, 
according to our current logic, e.g. {{getTurnOffTip()}}.

{quote}
{code}
+// INITIALIZED -> THRESHOLD
+bmSafeMode.setBlockTotal(BLOCK_TOTAL);
+assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus());
+assertTrue(bmSafeMode.isInSafeMode());
{code}
4. It makes sense to put it in a test instead of in the @Before method.
{quote}
That makes sense to me. I'll add a new test called {{testSetBlockTotal}}.

{quote}
{code}
+// EXTENSION -> OFF
+Whitebox.setInternalState(bmSafeMode, "status", 
BMSafeModeStatus.EXTENSION);
+reachBlockThreshold();
+reachExtension();
+bmSafeMode.checkSafeMode();
+assertEquals(BMSafeModeStatus.OFF, getSafeModeStatus());
{code}
5. Please refactor the code – you can reuse the getSafeModeStatus() that is 
defined by the class.
{quote}
Actually the first statment for each state transition tests is to *set* the 
internal state.
I'll make the fields package accessible so the Whitebox is not needed. See the 
end of this comment.

{quote}
{code}
assertEquals(getBlockSafe(), i);
{code}
6. The expected value should go first according to the function signature.
{quote}
Nice catch. I will revise the whole test to fix all similar ones.

{quote}
7. A higher level question is that why it needs so many getInternalState() 
statements? It looks to me that many of these behaviors can be observed outside 
without whitebox testing.
{quote}
The reason was that *all* non-static fields in {{BlockManagerSafeMode}} is 
private. By design, I made them _private_ because we don't need access to its 
internal state in {{BlockManager}}.

In the new design, the v10 patch simply makes the fields package private so the 
test will be more straight-forward.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> 

[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.008.patch

The v8 patch is to address one checkstyle warning and one failing unit test in 
v7 patch. Other checkstyle warnings are existing ones and we can not resolve it 
in this patch. Findbugs warnings is unrelated. The other failing unit test 
seems flaky. Working on new unit test {{TestBlockManagerSafeMode}} for next 
patch.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.009.patch

In the v8 patch, the findbugs warning is unrelated, the checkstyle are for long 
method or file length which we can do nothing in this patch. The overall test 
looks good.

Per offline discussion with [~jingzhao] and [~wheat9], this patch adds new 
white-box unit test class named {{TestBlockManagerSafeMode}}.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.007.patch

Thank you [~wheat9] for your detailed comments. The v7 patch addresses most of 
them. See response inline.

{quote}
{code}
  if (status == BMSafeModeStatus.OFF) {
return;
  }
{code}
1. There are multiple cases like the above. They should be asserts.
{quote}

It is safe for these methods to be called multiple times. If safe mode is not 
currently on, this is a no-op. Previously we check whether the {{safeMode}} is 
null before calling the respective methods. E.g.
{code:title=previous volatile and null check to allow calling multiple times}
  public void checkSafeMode() {
  // safeMode is volatile, and may be set to null at any time
SafeModeInfo safeMode = this.safeMode;
if (safeMode != null) {
  safeMode.checkMode();
}
  }
{code}
As we moved the start up safe mode to {{BlockManager}} and maintain a state 
machine for safe mode status, the volatile and null {{safeMode}} trick is not 
needed. Meanwhile, we must allow {{BlockManagerSafeMode}}'s methods called 
multiple times without side effect.
I will explicitly elaborate this in comments for these methods.

{quote}
2. namesystem.isHaEnabled() does not change in the lifecycle of the process.
{quote}
This is a very good point. I considered this but the {{blockManager}}, which 
will create the {{blockManagerSafeMode}}, is constructed before the 
{{namesystem#haEnabled}} is initialized. Per offline discussion with [~wheat9] 
and [~jingzhao], we can initialize the {{namesystem#haEnabled}} before 
constructing the {{blockManager}}. This way, the {{namesystem.isHaEnabled}} is 
not called repeatedly in the critical path.

{quote}
3. It's better to document the conditions of state transition.
{quote}
Yes, it makes a lot of sense to document the state machine transition. In v7 
patch, I add a diagram in the comment.

{quote}
{code}
+needExtension = extension > 0 &&
+(blockThreshold > 0 || datanodeThreshold > 0);
{code}
4. This can be moved under the THRESHOLD statement and become a local variable.
{quote}
Actually it's hard, if not impossible, largely because the {{needExtension}} 
should be initialized in the start status, aka {{INITALIZED}}. There is a 
regression test for this case 
{{TestHASafeMode#testBlocksRemovedBeforeStandbyRestart}}, brought by 
[HDFS-2692].

{quote}
5. initializeReplQueuesIfNecessary() should be called only once.
{quote}
Yes, we should initialize the replication queue only once. Once called for the 
first time, {{BlockManager#initializeReplQueues}} will set a flag indicating 
the replication queue is initialized. We'll check this flag in 
{{isPopulatingReplQueues}} before calling {{initializeReplQueues}} again. As it 
is of great importance to guarantee this, I'll double check this and fix this 
in next patch.

{quote}
6. safeModeStatus = SafeModeStatus.OFF should be moved to BlockManagerSafeMode.
{quote}
The {{safeModeStatus}} was for the {{FSNamesystem}} and the *OFF* here 
indicates both {{FSNamesystem}} and {{BlockManager}} leave the safe mode. 
{{BlockManagerSafeMode}}'s internal {{status}} was maintained in its own 
{{leaveSafeMode}} method.

Per offline discussion with [~wheat9] and [~jingzhao], the better design is to 
simplify the {{Namesystem}} safe mode to two flags indicating _manually_ or 
_resoure low_. In this way, the safe mode check is pretty straight-forward. The 
side benefit is that when we extend the current safe mode status, one more flag 
will work just fine, without breaking the existing code.
{code:title=new manual and resource low safe mode flag}
  private volatile boolean isInManualSafeMode = false;
  private volatile boolean isInResourceLowSafeMode = false;
  ...
  @Override
  public boolean isInSafeMode() {
return isInManualSafeMode ||
isInResourceLowSafeMode ||
blockManager.isInSafeMode();
  }
{code}

{quote}
7. A cleaner approach is to put the {{reached}} timestamp into the constructor 
of {{SafeModeMonitor()}}.
{quote}
It's a good point to define the {{reached}} value in the monitor. The v7 patch 
initializes it when the monitor starts. As the {{reached}} timestamp is partly 
used out of {{SafeModeMonitor}}, e.g. in {{getSafeModeTip}} and {{checkMode}}, 
the easy (may not be best) way is to treat it as class field.

{quote}
8. It might be good to have additional unit tests for BlockManagerSafeMode.
{quote}
That makes perfect sense to me. I'll add new unit test named 
{{TestBlockManagerSafeMode}} in the next patch.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>

[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-17 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.006.patch

As per offline discussion, the v6 patch prefers synchronized {{getSafeModeTip}} 
to volatile fields. Some public methods for test in {{BlockManagerSafeMode}} 
are removed as well.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.004.patch

The v4 patch is to address the failing tests and further refactoring is 
possible. Please hold on before reviewing this.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.005.patch

The failing test is not related. The v5 patch is not tracking safe blocks after 
leaving safe mode. Any comment is welcome. Will work on reducing 
synchronization overhead in {{BlockManagerSafeMode}}.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-14 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.003.patch

In v3 patch, the {{Namesystem}} will not leave *STARTUP* to *OFF* state. 
Instead, it asks the {{BlockManager}} every time when its 
{{isInStartupSafeMode}} is called. This way, we are able to simplify 
{{Namesystem}}'s state machine by delegating all *STARTUP* safe mode logic, 
which includes leaving safe mode, to {{BlockManager}}, and to 
{{BlockManagerSafeMode}}.

V3 patch is for Jenkins. Please hold on before reviewing it.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-08 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.002.patch

Thank you [~jingzhao]. The v2 patch addresses the comments.

Thank you [~daryn] for your input. I briefly illustrate the current design as 
follows. The patch is not very completed and further refactor may be necessary.

Basically, the patch is to split the name node safe mode to two levels. The 
first one is the {{FSNamesystem}} and the second one is 
{{BlockManagerSafeMode}}. The main code change is two parts:
# The first-level safe mode code is kept in  {{FSNamesystem}}
# The second-level safe mode is moved to {{blockmanagement}} package

At beginning, the name node is in *STARTUP* safe mode, where the block manager 
is tracking blocks and data nodes. The name node will leave *STARTUP* mode to
* *OFF*: if either of the two conditions is reached
*# The second level safe mode is *OFF*. This is the case that block manger 
leaves safe mode automatically once threshold and extension are met
*# administrator operates to leave safe mode manually
* *MANUALLY*: administrator operates to enter safe mode manually
* *RESOURCE_LOW*: resource low monitored

The first level safe mode is a simple state machine. Other transitions like 
*MANUALLY* to *OFF* is straight-forward.

As inferred from above, the second level is meaningful and valid if and only if 
the first level safe mode is in *STARTUP*. At beginning, the block manager is 
in *INITIALIZED* mode, and it will leave this mode if:
* thresholds are met (to *OFF*) mode as no extension is needed
* thresholds are not met (to *THRESHOLD* mode)

The *THRESHOLD* mode is pending on block and data node thresholds. If the 
thresholds are met, the block manager will leave this mode, and change to:
* *OFF* if extension is not needed (e.g. {{extension}} config value is 0)
* *EXTENSION* if extension is needed

The *EXTENSION* mode is pending on extension period. The block manager will 
leave this mode to *OFF* if the two conditions are reached:
* extension period is reached (checked by a monitor thread)
* thresholds are met

The main design motivation is that the {{FSNamesystem}} and {{BlockManager}} 
maintain their own states by themselves.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-05 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.000.patch

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-05 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Status: Patch Available  (was: Open)

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-05 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.001.patch

The v1 patch fixes the failing tests (locally) and whitespace warnings.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)