[ 
https://issues.apache.org/jira/browse/HDFS-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790524#comment-16790524
 ] 

Istvan Fajth edited comment on HDFS-14315 at 3/12/19 3:16 PM:
--------------------------------------------------------------

Hi here,

I have ran into this issue durning investigating the same behaviour, and I 
wanted to add some additional information that I found out, that may change the 
direction of the ticket as well a bit.

I ran into the following scenario:
 Given:
 * a file that consists from one block with replication factor 3
 * a snapshot created that contains this file

When:
 * the file replication is set to 2 with hdfs dfs -setrep 2 /path/to/file
 * and then removing the snapshot holds the file with rep factor 3

 Then:
 * the file has 3 block replicas until the next FBR arrives from all the DNs 
holding the block replicas for the file
 * after a NameNode restart all 3 replica is marked to be in a stale state 
according to hdfs fsck <blockID> and in stale state until the FBR arrives, and 
effectively this is what prevents the removal of the excess replica according 
to this log message and the code around the log message:
 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: BLOCK* 
rescanPostponedMisreplicatedBlocks: Re-scanned block <blockID>, result is 
POSTPONE
 * before the NameNode restart the replicas are not in stale state

I think the problem is that the block is in an unexpected state that prevents 
the excess replica be removed, and that we might examine further but may be in 
an other JIRA it is up to you. I am not confident enough to decide whether the 
described behaviour is expected or is a bug, I just wanted to add here to 
discuss this as well and decide if we want to fix something here.


was (Author: pifta):
Hi here,

I have ran into this issue durning investigating the same behaviour, and I 
wanted to add some additional information that I found out, that may change the 
direction of the ticket as well a bit.

I ran into the following scenario:
Given:
* a file that consists from one block with replication factor 3
* a snapshot created that contains this file
When:
* the file replication is set to 2 with hdfs dfs -setrep 2 /path/to/file
* and then removing the snapshot holds the file with rep factor 3
Then:
* the file has 3 block replicas until the next FBR arrives from all the DNs 
holding the block replicas for the file
* after a NameNode restart all 3 replica is marked to be in a stale state 
according to hdfs fsck <blockID> and in stale state until the FBR arrives, and 
effectively this is what prevents the removal of the excess replica according 
to this log message and the code around the log message:
DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: BLOCK* 
rescanPostponedMisreplicatedBlocks: Re-scanned block <blockID>, result is 
POSTPONE
* before the NameNode restart the replicas are not in stale state

I think the problem is that the block is in an unexpected state that prevents 
the excess replica be removed, and that we might examine further but may be in 
an other JIRA it is up to you. I am not confident enough to decide whether the 
described behaviour is expected or is a bug, I just wanted to add here to 
discuss this as well and decide if we want to fix something here.


> Add more detailed log message when decreasing replication factor < max in 
> snapshots
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-14315
>                 URL: https://issues.apache.org/jira/browse/HDFS-14315
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.1.2
>            Reporter: Daisuke Kobayashi
>            Assignee: Daisuke Kobayashi
>            Priority: Minor
>         Attachments: HDFS-14315.000.patch, HDFS-14315.001.patch
>
>
> When changing replication factor for a given file, the following 3 types of 
> logging appear in the namenode log, but more detailed message would be useful 
> especially when the file is in snapshot(s).
> {noformat}
> Decreasing replication from X to Y for FILE
> Increasing replication from X to Y for FILE
> Replication remains unchanged at X for FILE
> {noformat}
> I have added additional log messages as well as further test scenarios to 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotReplication#testReplicationWithSnapshot.
> The test sequence is:
> 1) A file is created with replication factor 1 (there are 5 datanodes)
> 2) Take a snapshot and increase the factor by 1. Continue this loop up to 5.
> 3) Setrep back to 3, but the target replication is decided to 4, which is the 
> maximum in snapshots.
> {noformat}
> 2019-02-25 17:17:26,800 [IPC Server handler 9 on default port 55726] INFO  
> namenode.FSDirectory (FSDirAttrOp.java:unprotectedSetReplication(408)) - 
> Decreasing replication from 5 to 4 for /TestSnapshot/sub1/file1. Requested 
> value is 3, but 4 is the maximum in snapshots
> {noformat}
> 4) Setrep to 3 again, but it's unchanged as follows. Both 3) and 4) are 
> expected.
> {noformat}
> 2019-02-25 17:17:26,804 [IPC Server handler 6 on default port 55726] INFO  
> namenode.FSDirectory (FSDirAttrOp.java:unprotectedSetReplication(420)) - 
> Replication remains unchanged at 4 for /TestSnapshot/sub1/file1 . Requested 
> value is 3, but 4 is the maximum in snapshots.
> {noformat}
> 5) Make sure the number of replicas in datanodes remains 4.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to