[ https://issues.apache.org/jira/browse/HDFS-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184333#comment-15184333 ]
Lin Yiqun commented on HDFS-9904: --------------------------------- Thanks [~kihwal] for concrete analysation. I am ignored for that. {quote} Also, it should be set before the namenode is started and should be reset for other test cases. {quote} In method {{testCheckpointCancellationDuringUpload}}, it has already restart all namenodes after. So I reset the configuration here is ok. {code} // don't compress, we want a big image for (int i = 0; i < NUM_NNS; i++) { cluster.getConfiguration(i).setBoolean( DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, false); } // Throttle SBN upload to make it hang during upload to ANN for (int i = 1; i < NUM_NNS; i++) { cluster.getConfiguration(i).setLong( DFSConfigKeys.DFS_IMAGE_TRANSFER_RATE_KEY, 100); } for (int i = 0; i < NUM_NNS; i++) { cluster.restartNameNode(i); } {code} It seems that there was a similar problem in {{testNonPrimarySBNUploadFSImage}}. If first namenode change to standby, because 10 is bigger than 5(set value), it will also do a checkpoint. And actually, the checkpoint should be uploaded by one of standby nodes. {code} doEdits(0, 10); cluster.transitionToStandby(0); {code} Am I think right? If so, we can slove both two in this jira. Finally update a patch for addressing your comments. > testCheckpointCancellationDuringUpload occasionally fails > ---------------------------------------------------------- > > Key: HDFS-9904 > URL: https://issues.apache.org/jira/browse/HDFS-9904 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Affects Versions: 2.7.3 > Reporter: Kihwal Lee > Attachments: HDFS-9904.001.patch > > > The failure was at the end of the test case where the txid of the standby > (former active) is checked. Since the checkpoint/uploading was canceled , it > is not supposed to have the new checkpoint. Looking at the test log, that was > still the case, but the standby then did checkpoint on its own and bumped up > the txid, right before the check was performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)