virajjasani commented on a change in pull request #3235:
URL: https://github.com/apache/hadoop/pull/3235#discussion_r679753606



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java
##########
@@ -433,15 +440,28 @@ public void 
testStandbyTriggersLogRollsWhenTailInProgressEdits()
         NameNodeAdapter.mkdirs(active, getDirPath(i),
             new PermissionStatus("test", "test",
             new FsPermission((short)00755)), true);
+        // reset lastRollTimeMs in EditLogTailer.
+        active.getNamesystem().getEditLogTailer().resetLastRollTimeMs();

Review comment:
       Thanks for taking a look @jojochuang. 
   `EditLogTailer` has a thread that keeps running to identify when is the 
right time to trigger log rolling by calling Active Namenode's rollEditLog() 
API.
   ```
       private void doWork() {
         long currentSleepTimeMs = sleepTimeMs;
         while (shouldRun) {
           long editsTailed  = 0;
           try {
             // There's no point in triggering a log roll if the Standby hasn't
             // read any more transactions since the last time a roll was
             // triggered.
             boolean triggeredLogRoll = false;
             if (tooLongSinceLastLoad() &&
                 lastRollTriggerTxId < lastLoadedTxnId) {
               triggerActiveLogRoll();
               triggeredLogRoll = true;
             }
   ...
   ...
   ```
   
   What happens with this test is that by the time we create new dirs in this 
for loop, this active thread would keep checking and intermittently keep 
triggering log roll by making RPC calls to Active Namenode, and hence this test 
would become flaky because the test expects Standby Namenode's last applied txn 
id to be less than active Namenode's last written txn id within a specific time 
duration (this is the only reason behind it's flakiness). When it comes to how 
long EditLogTailer's thread keeps waiting to trigger log roll, it depends on 
`lastRollTimeMs`.
   
   In the above code, tooLongSinceLastLoad() refers to:
   ```
     /**
      * @return true if the configured log roll period has elapsed.
      */
     private boolean tooLongSinceLastLoad() {
       return logRollPeriodMs >= 0 && 
         (monotonicNow() - lastRollTimeMs) > logRollPeriodMs;
     }
   ```
   Hence, until `lastRollTimeMs` worth of time is elapsed, log roll would not 
be tailed, however, this always tends to be flaky because we have no control 
over how much time mkdir calls in this for loop is going to take and in that 
meantime, `lastRollTimeMs` worth of time can be elapsed easily, hence this test 
is flaky. When we expect Standby Namenode's txnId to be less than that of 
Active Namenode, it is not the case because log is rolled by above thread in 
`EditLogTailer`.
   
   Hence, it is important for this test to keep resetting `lastRollTimeMs` 
while mkdir calls are getting executed so that we don't give chance for 
`tooLongSinceLastLoad()` to be successful until we want it to be successful.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to