David Ribeiro Alves has posted comments on this change. Change subject: Fix flaky test TestRecoverFromOpIdOverflow ......................................................................
Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/6808/1//COMMIT_MSG Commit Message: PS1, Line 9: This test is flaky because we race against the COMMIT message for the : first NO_OP in the WAL being written. It is currently hard to know when : the actual COMMIT message is written to the WAL so we use a workaround : to delete the first log segment before restarting the EMC in this test. can you explain a little better how we'd get an overflow here? http://gerrit.cloudera.org:8080/#/c/6808/1/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: PS1, Line 368: // Before restarting the tablet server, delete the initial log segment from : // disk (the original leader election NO_OP) if it exists since it will : // contain OpId 1.1; if it doesn't also contain the COMMIT message for 1.1 : // yet then it will trigger a CHECK complaining about non-sequential OpIds : // in the WAL at tablet bootstrap time. If I understand what you're doing it that you remove the first segment to make sure the old op isn't committed on restart. Is that right? Can this happen in the wild? PS1, Line 383: wal_children you mean "wal_dir" right? -- To view, visit http://gerrit.cloudera.org:8080/6808 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib382819307da04bb76d68d2c015dc0edd9f60267 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes