[ 
https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060775#comment-16060775
 ] 

Rakesh R commented on HDFS-11965:
---------------------------------

Thanks [~surendrasingh] for working on this. 

Putting sleep is one way to fix the problem. I'm more inclined to the 2nd 
proposal and fix the unit testing failure.

But the real problem here is SPS not considering all the block replica 
movements due to the delay in datanode registration and corresponding block 
reports. Currently, SPS is scheduling the blocks movement only for those 
locations/datanodes which are registered and once all these scheduled replicas 
are satisfied the policy it is removing the {{sps xattr}} from the Inode. Here 
it is not giving chance to those datanode replicas which have delay in 
registration and block reports. Its a kind of situation that block is under 
replicated, right?

I had an offline discussion with [~umamaheswararao]. How about before removing 
the {{sps xattr}} from the Inode, check whether all the block replicas have 
given a chance to do the block movement, if not wait for certain period to 
finish the under replication task.

> [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure
> -------------------------------------------------------------------------
>
>                 Key: HDFS-11965
>                 URL: https://issues.apache.org/jira/browse/HDFS-11965
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-10285
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>
> The test case is failing because all the required replicas are not moved in 
> expected storage. This is happened because of delay in datanode registration 
> after cluster restart.
> Scenario :
> 1. Start cluster with 3 DataNodes.
> 2. Create file and set storage policy to WARM.
> 3. Restart the cluster.
> 4. Now Namenode and two DataNodes started first and  got registered with 
> NameNode. (one datanode  not yet registered)
> 5. SPS scheduled block movement based on available DataNodes (It will move 
> one replica in ARCHIVE based on policy).
> 6. Block movement also success and Xattr removed from the file because this 
> condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}.
> {code}
> if (itemInfo != null
>                 && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
>               blockStorageMovementNeeded
>                   .add(storageMovementAttemptedResult.getTrackId());
>             ....................
>             ......................
>             } else {
>             ....................
>             ......................
>               this.sps.postBlkStorageMovementCleanup(
>                   storageMovementAttemptedResult.getTrackId());
>             }
> {code}
> 7. Now third DN registered with namenode and its reported one more DISK 
> replica. Now Namenode has two DISK and one ARCHIVE replica.
> In test case we have condition to check the number of DISK replica..
> {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, 
> timeout, fs);{code}
> This condition never became true and test case will be timed out.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to