[ https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069958#comment-16069958 ]
Rakesh R commented on HDFS-11965: --------------------------------- Thank you [~surendrasingh] for patch and test cases looks good. Adding few comments, please take care. # I could see all the functions operating over this map is synchronized. So, we don't need {{AtomicInteger}}, just Integer is enough. {{private final Map<Long, Integer> lowRedundantFileRetryCount}} # Instead of {{under-replicated blocks}} terminology, could you please use {{low redundant blocks}}. {{isUnderReplicated}} can be changed to {{hasLowRedundancyBlocks()}}. Similarly can change other occurances as well. # {{int MAX_RETRY_FOR_UNER_REPLICATED_FILE = 10;}}, nit sure whether this is too small. How about increase retries change it to 50 to give more chances. # Logging has to be changed reflecting the retry case. Presently, it will say SUCCESS and will mislead, right? {code} } else { LOG.info(msg); //Check if file is under-replicated or some blocks are not //satisfy the policy. If file is under-replicate, SPS will //retry for some interval and wait for DN to report the block. {code} # Would be great if you could add following unit test cases: (a) EC unit tests in {{TestStoragePolicySatisfierWithStripedFile}} to cover the low redundant striped block logic. (b) File blocks has extra redundant blocks. Here, the verification point is, SPS should consider only needed replica count for satisfying storage policy. For example, replication is 3, but it has extra redundant blocks(2 additional replicas, 3 + 2 = total 5 count). After satisfying 3 replica, SPS can mark as SUCCESS and remove the xattr. > [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure > ------------------------------------------------------------------------- > > Key: HDFS-11965 > URL: https://issues.apache.org/jira/browse/HDFS-11965 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Affects Versions: HDFS-10285 > Reporter: Surendra Singh Lilhore > Assignee: Surendra Singh Lilhore > Attachments: HDFS-11965-HDFS-10285.001.patch, > HDFS-11965-HDFS-10285.002.patch > > > The test case is failing because all the required replicas are not moved in > expected storage. This is happened because of delay in datanode registration > after cluster restart. > Scenario : > 1. Start cluster with 3 DataNodes. > 2. Create file and set storage policy to WARM. > 3. Restart the cluster. > 4. Now Namenode and two DataNodes started first and got registered with > NameNode. (one datanode not yet registered) > 5. SPS scheduled block movement based on available DataNodes (It will move > one replica in ARCHIVE based on policy). > 6. Block movement also success and Xattr removed from the file because this > condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}. > {code} > if (itemInfo != null > && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) { > blockStorageMovementNeeded > .add(storageMovementAttemptedResult.getTrackId()); > .................... > ...................... > } else { > .................... > ...................... > this.sps.postBlkStorageMovementCleanup( > storageMovementAttemptedResult.getTrackId()); > } > {code} > 7. Now third DN registered with namenode and its reported one more DISK > replica. Now Namenode has two DISK and one ARCHIVE replica. > In test case we have condition to check the number of DISK replica.. > {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, > timeout, fs);{code} > This condition never became true and test case will be timed out. > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org