[jira] [Commented] (HDFS-9904) testCheckpointCancellationDuringUpload occasionally fails

Lin Yiqun (JIRA) Mon, 07 Mar 2016 19:38:32 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184333#comment-15184333
 ]


Lin Yiqun commented on HDFS-9904:
---------------------------------

Thanks [~kihwal] for concrete analysation. I am ignored for that.
{quote}
Also, it should be set before the namenode is started and should be reset for 
other test cases.
{quote}
In method {{testCheckpointCancellationDuringUpload}}, it has already restart 
all namenodes after. So I reset the configuration here is ok.
{code}
    // don't compress, we want a big image
    for (int i = 0; i < NUM_NNS; i++) {
      cluster.getConfiguration(i).setBoolean(
          DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, false);
    }

    // Throttle SBN upload to make it hang during upload to ANN
    for (int i = 1; i < NUM_NNS; i++) {
      cluster.getConfiguration(i).setLong(
          DFSConfigKeys.DFS_IMAGE_TRANSFER_RATE_KEY, 100);
    }
    for (int i = 0; i < NUM_NNS; i++) {
      cluster.restartNameNode(i);
    }
{code}
It seems that there was a similar problem in 
{{testNonPrimarySBNUploadFSImage}}. If first namenode change to standby, 
because 10 is bigger than 5(set value), it will also do a checkpoint. And 
actually, the checkpoint should be uploaded by one of standby nodes.
{code}
doEdits(0, 10);
cluster.transitionToStandby(0);
{code}
Am I think right? If so, we can slove both two in this jira. Finally update a 
patch for addressing your comments.



> testCheckpointCancellationDuringUpload occasionally fails 
> ----------------------------------------------------------
>
>                 Key: HDFS-9904
>                 URL: https://issues.apache.org/jira/browse/HDFS-9904
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.7.3
>            Reporter: Kihwal Lee
>         Attachments: HDFS-9904.001.patch
>
>
> The failure was at the end of the test case where the txid of the standby 
> (former active) is checked. Since the checkpoint/uploading was canceled , it 
> is not supposed to have the new checkpoint. Looking at the test log, that was 
> still the case, but the standby then did checkpoint on its own and bumped up 
> the txid, right before the check was performed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9904) testCheckpointCancellationDuringUpload occasionally fails

Reply via email to