[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9129: Fix Version/s: (was: 3.0.0) > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Haohui Mai >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, > HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, > HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, > HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, > HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, > HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, > HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, > HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, > HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, > HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9129: Fix Version/s: 2.9.0 > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Haohui Mai >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, > HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, > HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, > HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, > HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, > HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, > HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, > HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, > HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, > HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9129: Resolution: Fixed Status: Resolved (was: Patch Available) > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Haohui Mai >Assignee: Mingliang Liu > Fix For: 2.9.0 > > Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, > HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, > HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, > HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, > HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, > HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, > HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, > HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, > HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, > HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Status: Patch Available (was: Reopened) > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Haohui Mai >Assignee: Mingliang Liu > Fix For: 3.0.0 > > Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, > HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, > HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, > HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, > HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, > HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, > HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, > HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, > HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, > HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9129: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) I've committed this into trunk. Thanks [~liuml07] for the contribution! Thanks [~wheat9] and [~daryn] for the discussion and review. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Haohui Mai >Assignee: Mingliang Liu > Fix For: 3.0.0 > > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, > HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129-branch-2.025.patch Thank you very much [~jingzhao] for your review, discussion and commit. I upload the patch for {{branch-2}}. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Haohui Mai >Assignee: Mingliang Liu > Fix For: 3.0.0 > > Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, > HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, > HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, > HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, > HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, > HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, > HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, > HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, > HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, > HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.025.patch Thanks [~jingzhao] for the review. I revisited the TODO comment and I also think we can remove it safely. Nice catch. The v25 patch is to address this. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, > HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Component/s: namenode > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, > HDFS-9129.023.patch, HDFS-9129.024.patch, HDFS-9129.025.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.024.patch Per offline discussion with [~jingzhao], the v24 patch removes the {{shouldIncrementallyTrackBlocks}} as it's not needed, and initializes the {{startTime}} in {{activate}}. After this change, all previous comments should have been addressed. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, > HDFS-9129.023.patch, HDFS-9129.024.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.023.patch The v23 patch is to address the findbugs warnings. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, > HDFS-9129.023.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.022.patch Thanks [~jingzhao] for your detailed review and I appreciate it very much. The v22 patch is to address all the comments with the following open discussion: {quote} 2. The following two initializations may be wrong: with the patch the safemode object is created when contructing BlockManager, before loading fsimage and editlog from disk. {code} private final long startTime = monotonicNow(); private long lastStatusReport = monotonicNow(); {code} {quote} In v22 patch, the {{lastStatusReport}} is initialized in {{activate()}} method when the status was firstly reported. The {{startTime}} was the namesystem start time in the {{trunk}} code. As its only usage is to calculate how long it takes to leave safemode for status report, perhaps we don't need accurate value? An alternative approach is to initialize the value in {{activate()}} method. {quote} 3. {{shouldIncrementallyTrackBlocks}} is actually determined by {{haEnabled}} thus looks like it can be declared as final and {{isSafeModeTrackingBlocks}} can be simplified. {quote} Yes it merely depends on {{haEnabled}} when the {{setBlockTotal()}} is firstly called by {{activate()}}. However it's hard to make it final as it's not initialized in {{BlockManagerSafeMode}} constructor. In any case, the {{isSafeModeTrackingBlocks}} is simplified for sure. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.021.patch The failing tests can pass locally. The v21 patch is to address the findbugs warnings. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch, HDFS-9129.021.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.020.patch Per offline discussion with [~jingzhao], the v19 patch has a fatal bug for initializing the replication queue. The v20 patch fixes this bug and revisits the synchronized behavior. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: (was: HDFS-9129.020.patch) > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.020.patch > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, > HDFS-9129.020.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.019.patch The v19 patch is the latest effort of simplifying the {{BlockManagerSafeMode}} status. The {{INITIALIZED}} enum is considered making the safe mode complicated and thus removed. Let's see Jenkins report for the functional test. Will revisit the synchronized behavior. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.018.patch The v18 patch fixes the flaky {{TestBlockManagerSafeMode}} unit test. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch, HDFS-9129.018.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.016.patch The failing tests can pass on my local machine. The v16 patch is to address the findbugs warnings. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.017.patch > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, > HDFS-9129.017.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.012.patch > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.015.patch The v15 patch fixes another conflict with [HDFS-4015]. Per offline discussion with [~arpitagarwal] and [~anu], the {{smmthread}} should not repeatedly reporting that: {quote} Refusing to leave safe mode without a force flag. Exiting safe mode will cause a deletion of 590683116 byte(s). Please use -forceExit flag to exit safe mode forcefully if data loss is acceptable. {quote} > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch, HDFS-9129.015.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.013.patch The v13 patch revisits the synchronized methods in {{BlockManagerSafeMode}} > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.014.patch The v14 patch resolves the conflicts with [HDFS-4015]. Thanks to [~anu] for kindly pointing out and help me reproduce the bug. A new unit test in {{TestBlockManagerSafeMode}} is added as well. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, > HDFS-9129.014.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.011.patch {quote} {code} private volatile boolean isInManualSafeMode = false; private volatile boolean isInResourceLowSafeMode = false; ... isInManualSafeMode = !resourcesLow; isInResourceLowSafeMode = resourcesLow; {code} How do these two variables synchronize? Is the system in consistent state in the middle of the execution? {quote} Per offline discussion with [~wheat9], the {{volatile}} keyword is considered premature optimization. Make {{isInResourceLowSafeMode }} short-circuit {{isInManualSafeMode}} is bad design for incoming changes. The v11 patch uses synchronized block to make the system stay in consistent state in the middle of the execution. The comment for {{BlockManagerSafeMode}} (ahead of the class) is also refined. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, > HDFS-9129.011.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.010.patch Thank you [~wheat9] for your review. The v10 patch addresses the comments, along with rebasing from {{trunk}} branch. As there are conflicts because of [HDFS-4015], I also invited [~arpitagarwal] and [~anu] to review the patch. See response inline to [~wheat9]'s comments. {quote} 1. It does not give much information compared to figuring out the issues on the code directly. What does "thresholds met" / "extensions reached" mean? It causes more confusions than explanations. {quote} The motivation is that diagram is helpful for glimpse, if we can provide definition of "thresholds met" / "extension reached". In v10 patch, I add more explanations in the comments. {quote} {code} LOG.error("Non-recognized block manager safe mode status: {}", status); {code} 2. Should be an assert. {quote} Truely, I'll simply {{assert false : "some comment"}}. {quote} {code} private volatile boolean isInManualSafeMode = false; private volatile boolean isInResourceLowSafeMode = false; ... isInManualSafeMode = !resourcesLow; isInResourceLowSafeMode = resourcesLow; {code} 3. How do these two variables synchronize? Is the system in consistent state in the middle of the execution? {quote} Good question. Actually it's not in consistent state in the middle of the execution. If the {{resourceLow}} is true, and before that name node is in manual safe mode, in the middle of the execution {{isInSafeMode}} will return false, which means the safe mode is OFF. The main reason is that writing to the two variables (aka enter/leave safemode) is guarded by the FSNamesystem write lock, while read is not. The enum type state was replace with two boolean flags in the v7 patch, because the two-lay state machine was cumbersome per offline discussion with [~wheat9] and [~jingzhao]. Guarding all the read looks expensive. Bitwise operation on a flag variable seems tricky. The new design goes back to the {{trunk}} logic, which makes the block manager stays in safe mode in the middle of the execution: {code} if (resourcesLow) { isInResourceLowSafeMode = true; } else { isInManualSafeMode = true; } {code} In case both {{isInManualSafeMode}} and {{isInResourceLowSafeMode}} are true, {{isInResourceLowSafeMode}} will short-circuit {{isInManualSafeMode}}, according to our current logic, e.g. {{getTurnOffTip()}}. {quote} {code} +// INITIALIZED -> THRESHOLD +bmSafeMode.setBlockTotal(BLOCK_TOTAL); +assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus()); +assertTrue(bmSafeMode.isInSafeMode()); {code} 4. It makes sense to put it in a test instead of in the @Before method. {quote} That makes sense to me. I'll add a new test called {{testSetBlockTotal}}. {quote} {code} +// EXTENSION -> OFF +Whitebox.setInternalState(bmSafeMode, "status", BMSafeModeStatus.EXTENSION); +reachBlockThreshold(); +reachExtension(); +bmSafeMode.checkSafeMode(); +assertEquals(BMSafeModeStatus.OFF, getSafeModeStatus()); {code} 5. Please refactor the code – you can reuse the getSafeModeStatus() that is defined by the class. {quote} Actually the first statment for each state transition tests is to *set* the internal state. I'll make the fields package accessible so the Whitebox is not needed. See the end of this comment. {quote} {code} assertEquals(getBlockSafe(), i); {code} 6. The expected value should go first according to the function signature. {quote} Nice catch. I will revise the whole test to fix all similar ones. {quote} 7. A higher level question is that why it needs so many getInternalState() statements? It looks to me that many of these behaviors can be observed outside without whitebox testing. {quote} The reason was that *all* non-static fields in {{BlockManagerSafeMode}} is private. By design, I made them _private_ because we don't need access to its internal state in {{BlockManager}}. In the new design, the v10 patch simply makes the fields package private so the test will be more straight-forward. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the >
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.008.patch The v8 patch is to address one checkstyle warning and one failing unit test in v7 patch. Other checkstyle warnings are existing ones and we can not resolve it in this patch. Findbugs warnings is unrelated. The other failing unit test seems flaky. Working on new unit test {{TestBlockManagerSafeMode}} for next patch. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.009.patch In the v8 patch, the findbugs warning is unrelated, the checkstyle are for long method or file length which we can do nothing in this patch. The overall test looks good. Per offline discussion with [~jingzhao] and [~wheat9], this patch adds new white-box unit test class named {{TestBlockManagerSafeMode}}. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, > HDFS-9129.008.patch, HDFS-9129.009.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.007.patch Thank you [~wheat9] for your detailed comments. The v7 patch addresses most of them. See response inline. {quote} {code} if (status == BMSafeModeStatus.OFF) { return; } {code} 1. There are multiple cases like the above. They should be asserts. {quote} It is safe for these methods to be called multiple times. If safe mode is not currently on, this is a no-op. Previously we check whether the {{safeMode}} is null before calling the respective methods. E.g. {code:title=previous volatile and null check to allow calling multiple times} public void checkSafeMode() { // safeMode is volatile, and may be set to null at any time SafeModeInfo safeMode = this.safeMode; if (safeMode != null) { safeMode.checkMode(); } } {code} As we moved the start up safe mode to {{BlockManager}} and maintain a state machine for safe mode status, the volatile and null {{safeMode}} trick is not needed. Meanwhile, we must allow {{BlockManagerSafeMode}}'s methods called multiple times without side effect. I will explicitly elaborate this in comments for these methods. {quote} 2. namesystem.isHaEnabled() does not change in the lifecycle of the process. {quote} This is a very good point. I considered this but the {{blockManager}}, which will create the {{blockManagerSafeMode}}, is constructed before the {{namesystem#haEnabled}} is initialized. Per offline discussion with [~wheat9] and [~jingzhao], we can initialize the {{namesystem#haEnabled}} before constructing the {{blockManager}}. This way, the {{namesystem.isHaEnabled}} is not called repeatedly in the critical path. {quote} 3. It's better to document the conditions of state transition. {quote} Yes, it makes a lot of sense to document the state machine transition. In v7 patch, I add a diagram in the comment. {quote} {code} +needExtension = extension > 0 && +(blockThreshold > 0 || datanodeThreshold > 0); {code} 4. This can be moved under the THRESHOLD statement and become a local variable. {quote} Actually it's hard, if not impossible, largely because the {{needExtension}} should be initialized in the start status, aka {{INITALIZED}}. There is a regression test for this case {{TestHASafeMode#testBlocksRemovedBeforeStandbyRestart}}, brought by [HDFS-2692]. {quote} 5. initializeReplQueuesIfNecessary() should be called only once. {quote} Yes, we should initialize the replication queue only once. Once called for the first time, {{BlockManager#initializeReplQueues}} will set a flag indicating the replication queue is initialized. We'll check this flag in {{isPopulatingReplQueues}} before calling {{initializeReplQueues}} again. As it is of great importance to guarantee this, I'll double check this and fix this in next patch. {quote} 6. safeModeStatus = SafeModeStatus.OFF should be moved to BlockManagerSafeMode. {quote} The {{safeModeStatus}} was for the {{FSNamesystem}} and the *OFF* here indicates both {{FSNamesystem}} and {{BlockManager}} leave the safe mode. {{BlockManagerSafeMode}}'s internal {{status}} was maintained in its own {{leaveSafeMode}} method. Per offline discussion with [~wheat9] and [~jingzhao], the better design is to simplify the {{Namesystem}} safe mode to two flags indicating _manually_ or _resoure low_. In this way, the safe mode check is pretty straight-forward. The side benefit is that when we extend the current safe mode status, one more flag will work just fine, without breaking the existing code. {code:title=new manual and resource low safe mode flag} private volatile boolean isInManualSafeMode = false; private volatile boolean isInResourceLowSafeMode = false; ... @Override public boolean isInSafeMode() { return isInManualSafeMode || isInResourceLowSafeMode || blockManager.isInSafeMode(); } {code} {quote} 7. A cleaner approach is to put the {{reached}} timestamp into the constructor of {{SafeModeMonitor()}}. {quote} It's a good point to define the {{reached}} value in the monitor. The v7 patch initializes it when the monitor starts. As the {{reached}} timestamp is partly used out of {{SafeModeMonitor}}, e.g. in {{getSafeModeTip}} and {{checkMode}}, the easy (may not be best) way is to treat it as class field. {quote} 8. It might be good to have additional unit tests for BlockManagerSafeMode. {quote} That makes perfect sense to me. I'll add new unit test named {{TestBlockManagerSafeMode}} in the next patch. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.006.patch As per offline discussion, the v6 patch prefers synchronized {{getSafeModeTip}} to volatile fields. Some public methods for test in {{BlockManagerSafeMode}} are removed as well. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.004.patch The v4 patch is to address the failing tests and further refactoring is possible. Please hold on before reviewing this. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.005.patch The failing test is not related. The v5 patch is not tracking safe blocks after leaving safe mode. Any comment is welcome. Will work on reducing synchronization overhead in {{BlockManagerSafeMode}}. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.003.patch In v3 patch, the {{Namesystem}} will not leave *STARTUP* to *OFF* state. Instead, it asks the {{BlockManager}} every time when its {{isInStartupSafeMode}} is called. This way, we are able to simplify {{Namesystem}}'s state machine by delegating all *STARTUP* safe mode logic, which includes leaving safe mode, to {{BlockManager}}, and to {{BlockManagerSafeMode}}. V3 patch is for Jenkins. Please hold on before reviewing it. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.002.patch Thank you [~jingzhao]. The v2 patch addresses the comments. Thank you [~daryn] for your input. I briefly illustrate the current design as follows. The patch is not very completed and further refactor may be necessary. Basically, the patch is to split the name node safe mode to two levels. The first one is the {{FSNamesystem}} and the second one is {{BlockManagerSafeMode}}. The main code change is two parts: # The first-level safe mode code is kept in {{FSNamesystem}} # The second-level safe mode is moved to {{blockmanagement}} package At beginning, the name node is in *STARTUP* safe mode, where the block manager is tracking blocks and data nodes. The name node will leave *STARTUP* mode to * *OFF*: if either of the two conditions is reached *# The second level safe mode is *OFF*. This is the case that block manger leaves safe mode automatically once threshold and extension are met *# administrator operates to leave safe mode manually * *MANUALLY*: administrator operates to enter safe mode manually * *RESOURCE_LOW*: resource low monitored The first level safe mode is a simple state machine. Other transitions like *MANUALLY* to *OFF* is straight-forward. As inferred from above, the second level is meaningful and valid if and only if the first level safe mode is in *STARTUP*. At beginning, the block manager is in *INITIALIZED* mode, and it will leave this mode if: * thresholds are met (to *OFF*) mode as no extension is needed * thresholds are not met (to *THRESHOLD* mode) The *THRESHOLD* mode is pending on block and data node thresholds. If the thresholds are met, the block manager will leave this mode, and change to: * *OFF* if extension is not needed (e.g. {{extension}} config value is 0) * *EXTENSION* if extension is needed The *EXTENSION* mode is pending on extension period. The block manager will leave this mode to *OFF* if the two conditions are reached: * extension period is reached (checked by a monitor thread) * thresholds are met The main design motivation is that the {{FSNamesystem}} and {{BlockManager}} maintain their own states by themselves. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.000.patch > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Status: Patch Available (was: Open) > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.001.patch The v1 patch fixes the failing tests (locally) and whitespace warnings. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)