[ https://issues.apache.org/jira/browse/HDFS-16143?focusedWorklogId=631631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631631 ]
ASF GitHub Bot logged work on HDFS-16143: ----------------------------------------- Author: ASF GitHub Bot Created on: 30/Jul/21 13:35 Start Date: 30/Jul/21 13:35 Worklog Time Spent: 10m Work Description: virajjasani commented on a change in pull request #3235: URL: https://github.com/apache/hadoop/pull/3235#discussion_r679753606 ########## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java ########## @@ -433,15 +440,28 @@ public void testStandbyTriggersLogRollsWhenTailInProgressEdits() NameNodeAdapter.mkdirs(active, getDirPath(i), new PermissionStatus("test", "test", new FsPermission((short)00755)), true); + // reset lastRollTimeMs in EditLogTailer. + active.getNamesystem().getEditLogTailer().resetLastRollTimeMs(); Review comment: Thanks for taking a look @jojochuang. `EditLogTailer` has a thread that keeps running to identify when is the right time to trigger log rolling by calling Active Namenode's rollEditLog() API. ``` private void doWork() { long currentSleepTimeMs = sleepTimeMs; while (shouldRun) { long editsTailed = 0; try { // There's no point in triggering a log roll if the Standby hasn't // read any more transactions since the last time a roll was // triggered. boolean triggeredLogRoll = false; if (tooLongSinceLastLoad() && lastRollTriggerTxId < lastLoadedTxnId) { triggerActiveLogRoll(); triggeredLogRoll = true; } ... ... ``` What happens with this test is that by the time we create new dirs in this for loop, this active thread would keep checking and intermittently keep triggering log roll by making RPC calls to Active Namenode, and hence this test would become flaky because the test expects Standby Namenode's last applied txn id to be less than active Namenode's last written txn id within a time limit duration. When it comes to how long EditLogTailer's thread keeps waiting to trigger log roll depends on `lastRollTimeMs`. In the above code, tooLongSinceLastLoad() refers to: ``` /** * @return true if the configured log roll period has elapsed. */ private boolean tooLongSinceLastLoad() { return logRollPeriodMs >= 0 && (monotonicNow() - lastRollTimeMs) > logRollPeriodMs; } ``` Hence, until `lastRollTimeMs` worth of time is elapsed, log roll would not be tailed, however, this always tends to be flaky because we have no control over how much time mkdir calls in this for loop is going to take and in that meantime, `lastRollTimeMs` worth of time can be elapsed easily, hence this test is flaky. When we expect Standby Namenode's txnId to be less than that of Active Namenode, it is not the case because log is rolled by above thread in `EditLogTailer`. Hence, it is important for this test to keep resetting `lastRollTimeMs` while mkdir calls are getting executed so that we don't give chance for `tooLongSinceLastLoad()` to be successful until we want it to be successful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 631631) Time Spent: 3.5h (was: 3h 20m) > TestEditLogTailer#testStandbyTriggersLogRollsWhenTailInProgressEdits is flaky > ----------------------------------------------------------------------------- > > Key: HDFS-16143 > URL: https://issues.apache.org/jira/browse/HDFS-16143 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test > Reporter: Akira Ajisaka > Assignee: Viraj Jasani > Priority: Major > Labels: pull-request-available > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > Time Spent: 3.5h > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3229/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > {quote} > [ERROR] > testStandbyTriggersLogRollsWhenTailInProgressEdits[0](org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer) > Time elapsed: 6.862 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:87) > at org.junit.Assert.assertTrue(Assert.java:42) > at org.junit.Assert.assertTrue(Assert.java:53) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer.testStandbyTriggersLogRollsWhenTailInProgressEdits(TestEditLogTailer.java:444) > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org