[ https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060775#comment-16060775 ]
Rakesh R commented on HDFS-11965: --------------------------------- Thanks [~surendrasingh] for working on this. Putting sleep is one way to fix the problem. I'm more inclined to the 2nd proposal and fix the unit testing failure. But the real problem here is SPS not considering all the block replica movements due to the delay in datanode registration and corresponding block reports. Currently, SPS is scheduling the blocks movement only for those locations/datanodes which are registered and once all these scheduled replicas are satisfied the policy it is removing the {{sps xattr}} from the Inode. Here it is not giving chance to those datanode replicas which have delay in registration and block reports. Its a kind of situation that block is under replicated, right? I had an offline discussion with [~umamaheswararao]. How about before removing the {{sps xattr}} from the Inode, check whether all the block replicas have given a chance to do the block movement, if not wait for certain period to finish the under replication task. > [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure > ------------------------------------------------------------------------- > > Key: HDFS-11965 > URL: https://issues.apache.org/jira/browse/HDFS-11965 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Affects Versions: HDFS-10285 > Reporter: Surendra Singh Lilhore > Assignee: Surendra Singh Lilhore > > The test case is failing because all the required replicas are not moved in > expected storage. This is happened because of delay in datanode registration > after cluster restart. > Scenario : > 1. Start cluster with 3 DataNodes. > 2. Create file and set storage policy to WARM. > 3. Restart the cluster. > 4. Now Namenode and two DataNodes started first and got registered with > NameNode. (one datanode not yet registered) > 5. SPS scheduled block movement based on available DataNodes (It will move > one replica in ARCHIVE based on policy). > 6. Block movement also success and Xattr removed from the file because this > condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}. > {code} > if (itemInfo != null > && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) { > blockStorageMovementNeeded > .add(storageMovementAttemptedResult.getTrackId()); > .................... > ...................... > } else { > .................... > ...................... > this.sps.postBlkStorageMovementCleanup( > storageMovementAttemptedResult.getTrackId()); > } > {code} > 7. Now third DN registered with namenode and its reported one more DISK > replica. Now Namenode has two DISK and one ARCHIVE replica. > In test case we have condition to check the number of DISK replica.. > {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, > timeout, fs);{code} > This condition never became true and test case will be timed out. > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org