[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994706#comment-14994706
 ] 

Jing Zhao commented on HDFS-9129:
---------------------------------

The latest patch looks good to me overall. Here're my comments:
# Let's still define {{BlockManagerSafeMode#status}} as private field, and 
provide getter/setter if necessary. In this way we can have better control of 
its value. Similarly for {{blockTotal}} and {{blockSafe}}.
# The following two initializations may be wrong: with the patch the safemode 
object is created when contructing BlockManager, before loading fsimage and 
editlog from disk.
{code}
private final long startTime = monotonicNow();
{code}
{code}
private long lastStatusReport = monotonicNow();
{code}
# {{shouldIncrementallyTrackBlocks}} is actually determined by {{haEnabled}} 
thus looks like it can be declared as final and {{isSafeModeTrackingBlocks}} 
can be simplified.
# {{BlockManagerSafeMode#setBlockTotal}} currently does two things: 1) updating 
threshold numbers, and 2) triggering mode check. We can separate #2 out of this 
method, and then {{activate}} does not need to do an unnecessary check.
# {{reached}} can be renamed to {{reachedTime}}
# In the old safemode semantic, once entering the extension state, NN never 
comes back to the normal safemode state, but can keep waiting in the extension 
state if the threshold is not met again. The current implementation changes 
this semantic. It's better to avoid this change here.
{code}
    case EXTENSION:
      if (!areThresholdsMet()) {
        // EXTENSION -> PENDING_THRESHOLD
        status = BMSafeModeStatus.PENDING_THRESHOLD;
      } 
{code}
# The following code can be simplified.
{code}
    if (status == BMSafeModeStatus.OFF) {
      return;
    }
    if (!shouldIncrementallyTrackBlocks) {
      return;
    }
{code}
# In {{adjustBlockTotals}}, the {{setBlockTotal}} call should be out of the 
synchronized block.
{code}
    synchronized (this) {
      ...
      blockSafe += deltaSafe;
      setBlockTotal(blockTotal + deltaTotal);
    }
{code}
# Not caused by this patch, but since {{doConsistencyCheck}} sometimes is not 
protected by any lock (e.g., {{computeDatanodeWork}}), the total number of 
blocks retrieved from blockManager and used by the consistency check can be 
inaccurate. So I think here we can replace the AssertionError to a warning log 
message.
# Let's still name the first parameter of {{incrementSafeBlockCount}} as 
"storageNum".
# In {{decrementSafeBlockCount}}, {{checkSafeMode}} only needs to be called 
when the first time the live replica number drops below the safe number. Thus 
{{checkSafeMode}} should be called within the if.
{code}
      if (blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
        this.blockSafe--;
      }
      assert blockSafe >= 0;
      checkSafeMode();
{code}

> Move the safemode block count into BlockManager
> -----------------------------------------------
>
>                 Key: HDFS-9129
>                 URL: https://issues.apache.org/jira/browse/HDFS-9129
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Haohui Mai
>            Assignee: Mingliang Liu
>         Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to