[jira] [Commented] (HDFS-11965) [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure

Rakesh R (JIRA) Fri, 30 Jun 2017 04:43:46 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069958#comment-16069958
 ]


Rakesh R commented on HDFS-11965:
---------------------------------

Thank you [~surendrasingh] for patch and test cases looks good. Adding few 
comments, please take care.

# I could see all the functions operating over this map is synchronized. So, we 
don't need {{AtomicInteger}}, just Integer is enough.
{{private final Map<Long, Integer> lowRedundantFileRetryCount}}
# Instead of {{under-replicated blocks}} terminology, could you please use 
{{low redundant blocks}}. 
{{isUnderReplicated}} can be changed to {{hasLowRedundancyBlocks()}}. Similarly 
can change other occurances as well.
# {{int MAX_RETRY_FOR_UNER_REPLICATED_FILE = 10;}}, nit sure whether this is 
too small. How about increase retries change it to 50 to give more chances.
# Logging has to be changed reflecting the retry case. Presently, it will say 
SUCCESS and will mislead, right?
{code}
} else {
        LOG.info(msg);
        //Check if file is under-replicated or some blocks are not
        //satisfy the policy. If file is under-replicate, SPS will
    //retry for some interval and wait for DN to report the block.
{code}
# Would be great if you could add following unit test cases:
        (a) EC unit tests in {{TestStoragePolicySatisfierWithStripedFile}} to 
cover the low redundant striped block logic.
        (b) File blocks has extra redundant blocks. Here, the verification 
point is, SPS should consider only needed replica count for satisfying storage 
policy. For example, replication is 3, but it has extra redundant blocks(2 
additional replicas, 3 + 2 = total 5 count). After satisfying 3 replica, SPS 
can mark as SUCCESS and remove the xattr.



> [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure
> -------------------------------------------------------------------------
>
>                 Key: HDFS-11965
>                 URL: https://issues.apache.org/jira/browse/HDFS-11965
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-10285
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>         Attachments: HDFS-11965-HDFS-10285.001.patch, 
> HDFS-11965-HDFS-10285.002.patch
>
>
> The test case is failing because all the required replicas are not moved in 
> expected storage. This is happened because of delay in datanode registration 
> after cluster restart.
> Scenario :
> 1. Start cluster with 3 DataNodes.
> 2. Create file and set storage policy to WARM.
> 3. Restart the cluster.
> 4. Now Namenode and two DataNodes started first and  got registered with 
> NameNode. (one datanode  not yet registered)
> 5. SPS scheduled block movement based on available DataNodes (It will move 
> one replica in ARCHIVE based on policy).
> 6. Block movement also success and Xattr removed from the file because this 
> condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}.
> {code}
> if (itemInfo != null
>                 && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
>               blockStorageMovementNeeded
>                   .add(storageMovementAttemptedResult.getTrackId());
>             ....................
>             ......................
>             } else {
>             ....................
>             ......................
>               this.sps.postBlkStorageMovementCleanup(
>                   storageMovementAttemptedResult.getTrackId());
>             }
> {code}
> 7. Now third DN registered with namenode and its reported one more DISK 
> replica. Now Namenode has two DISK and one ARCHIVE replica.
> In test case we have condition to check the number of DISK replica..
> {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, 
> timeout, fs);{code}
> This condition never became true and test case will be timed out.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11965) [SPS] Fix TestPersistentStoragePolicySatisfier#testWithCheckpoint failure

Reply via email to