[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

Vinitha Reddy Gankidi (JIRA) Tue, 24 May 2016 17:50:15 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299255#comment-15299255
 ]


Vinitha Reddy Gankidi commented on HDFS-10301:
----------------------------------------------

Thanks for your review [~cmccabe]. By legacy reports do you mean block reports 
from DNs before the concept of leases was introduced for block reports? 

{code}
public synchronized boolean checkLease(DatanodeDescriptor dn,
                                         long monotonicNowMs, long id) {
    if (id == 0) {
      LOG.debug("Datanode {} is using BR lease id 0x0 to bypass " +
          "rate-limiting.", dn.getDatanodeUuid());
      return true;
    }
    NodeData node = nodes.get(dn.getDatanodeUuid());
    if (node == null) {
      LOG.info("BR lease 0x{} is not valid for unknown datanode {}",
          Long.toHexString(id), dn.getDatanodeUuid());
      return false;
    }
    if (node.leaseId == 0) {
      LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " +
               "is not in the pending set.",
               Long.toHexString(id), dn.getDatanodeUuid());
      return false;
    }
{code}

Isn't {{id}} equal to 0 for legacy block reports and when block reports are 
manually triggered? My understanding is that {{node.leaseId}} is set to zero 
only when the lease is removed. In my patch, the lease is removed by looking at 
the current rpc index in the block report context.

{code}
if (context != null) {
        if (context.getTotalRpcs() == context.getCurRpc() + 1) {
          long leaseId = this.getBlockReportLeaseManager().removeLease(node);
          BlockManagerFaultInjector.getInstance().removeBlockReportLease(node, 
leaseId);
        }
{code}

When processing of storage report happens out of order, we may set 
{{node.leaseId=0}} before all DN storage reports are processed. Therefore, we 
log a message and continue to process the storage report even if 
{{node.leaseId=0}}. Please let me know if you see any issue with this approach.

During upgrades, we do not remove zombie storages. Once the upgrade is 
finalized, we go ahead and remove the zombie storages. 
{code}
if (nn.getFSImage().isUpgradeFinalized() && noStaleStorages) {
      Set<String> storageIDsInBlockReport = new HashSet<>();
      if (context.getTotalRpcs() == 1) {
        for (StorageBlockReport report : reports) {
          storageIDsInBlockReport.add(report.getStorage().getStorageID());
        }
        bm.removeZombieStorages(nodeReg, context, storageIDsInBlockReport);
      }
    }
{code}

Can you please elaborate on what you meant by "In general, your solution 
doesn't fix the problem during upgrade". What problems do you foresee?

I am currently investigating why the test 
{{TestAddOverReplicatedStripedBlocks}} failed.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

Reply via email to