David Ribeiro Alves has posted comments on this change.

Change subject: Fix flaky test TestRecoverFromOpIdOverflow
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6808/1//COMMIT_MSG
Commit Message:

PS1, Line 9: This test is flaky because we race against the COMMIT message for 
the
           : first NO_OP in the WAL being written. It is currently hard to know 
when
           : the actual COMMIT message is written to the WAL so we use a 
workaround
           : to delete the first log segment before restarting the EMC in this 
test.
can you explain a little better how we'd get an overflow here?


http://gerrit.cloudera.org:8080/#/c/6808/1/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

PS1, Line 368: // Before restarting the tablet server, delete the initial log 
segment from
             :     // disk (the original leader election NO_OP) if it exists 
since it will
             :     // contain OpId 1.1; if it doesn't also contain the COMMIT 
message for 1.1
             :     // yet then it will trigger a CHECK complaining about 
non-sequential OpIds
             :     // in the WAL at tablet bootstrap time.
If I understand what you're doing it that you remove the first segment to make 
sure the old op isn't committed on restart. Is that right? Can this happen in 
the wild?


PS1, Line 383: wal_children
you mean "wal_dir" right?


-- 
To view, visit http://gerrit.cloudera.org:8080/6808
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib382819307da04bb76d68d2c015dc0edd9f60267
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

Reply via email to