[jira] [Updated] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

Toshihiro Suzuki (Jira) Wed, 03 Jun 2020 21:12:25 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Toshihiro Suzuki updated HDFS-15386:
------------------------------------
    Description: 
When removing volumes, we need to invalidate all the blocks in the volumes. In 
the following code (FsDatasetImpl), we keep the blocks that will be invalidate 
in *blkToInvalidate* map. However as the key of the map is *bpid* (Block Pool 
ID), it will be overwritten by other removed volumes. As a result, the map will 
have only the blocks of the last volume we are removing, and invalidate only 
them:
{code:java}
for (String bpid : volumeMap.getBlockPoolList()) {
  List<ReplicaInfo> blocks = new ArrayList<>();
  for (Iterator<ReplicaInfo> it =
        volumeMap.replicas(bpid).iterator(); it.hasNext();) {
    ReplicaInfo block = it.next();
    final StorageLocation blockStorageLocation =
        block.getVolume().getStorageLocation();
    LOG.trace("checking for block " + block.getBlockId() +
        " with storageLocation " + blockStorageLocation);
    if (blockStorageLocation.equals(sdLocation)) {
      blocks.add(block);
      it.remove();
    }
  }
  blkToInvalidate.put(bpid, blocks);
}
{code}
[https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]

  was:
When removing volumes, we need to invalidate all the blocks in the volumes. In 
the following code (FsDatasetImpl), we keep the blocks that will be invalidate 
in *blkToInvalidate* map. However as the key of the map is *bpid* (Block Pool 
ID), it will be overwritten by other removed volumes. As a result, the map will 
have only the blocks of the last volume, and invalidate only them:
{code:java}
for (String bpid : volumeMap.getBlockPoolList()) {
  List<ReplicaInfo> blocks = new ArrayList<>();
  for (Iterator<ReplicaInfo> it =
        volumeMap.replicas(bpid).iterator(); it.hasNext();) {
    ReplicaInfo block = it.next();
    final StorageLocation blockStorageLocation =
        block.getVolume().getStorageLocation();
    LOG.trace("checking for block " + block.getBlockId() +
        " with storageLocation " + blockStorageLocation);
    if (blockStorageLocation.equals(sdLocation)) {
      blocks.add(block);
      it.remove();
    }
  }
  blkToInvalidate.put(bpid, blocks);
}
{code}
[https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]


> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15386
>                 URL: https://issues.apache.org/jira/browse/HDFS-15386
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Toshihiro Suzuki
>            Assignee: Toshihiro Suzuki
>            Priority: Major
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List<ReplicaInfo> blocks = new ArrayList<>();
>   for (Iterator<ReplicaInfo> it =
>         volumeMap.replicas(bpid).iterator(); it.hasNext();) {
>     ReplicaInfo block = it.next();
>     final StorageLocation blockStorageLocation =
>         block.getVolume().getStorageLocation();
>     LOG.trace("checking for block " + block.getBlockId() +
>         " with storageLocation " + blockStorageLocation);
>     if (blockStorageLocation.equals(sdLocation)) {
>       blocks.add(block);
>       it.remove();
>     }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

Reply via email to