[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Todd Lipcon has posted comments on this change. Change subject: Test for bug in exactly-once during tablet bootstrap .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Todd Lipcon has submitted this change and it was merged. Change subject: Test for bug in exactly-once during tablet bootstrap .. Test for bug in exactly-once during tablet bootstrap Here's a regression test for the bug which is causing raft_consensus-itest to occasionally think it has inserted 23 rows when in fact it has only inserted 20. The issue is in the rewriting of logs during bootstrap. If we do a write which gets a duplicate key error, the first time the COMMIT message is written, it includes the error. When the server restarts, it writes the COMMIT message again with only 'flushed: true' in the commit message. This is enough for bootstrap to know not to bother to replay it on subsequent restarts, but it has lost the error messages themselves. If the server restarts again, at this point it doesn't rebuild a proper response, but instead puts an errorless response into the ResultTracker. So, if an operation hits an error, and then the tablet server restarts twice while the client is still retrying, the client will falsely think that its operation has succeeded. This includes a disabled regression test which shows the bug. Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Reviewed-on: http://gerrit.cloudera.org:8080/5417 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc 2 files changed, 64 insertions(+), 2 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5417 to look at the new patch set (#6). Change subject: Test for bug in exactly-once during tablet bootstrap .. Test for bug in exactly-once during tablet bootstrap Here's a regression test for the bug which is causing raft_consensus-itest to occasionally think it has inserted 23 rows when in fact it has only inserted 20. The issue is in the rewriting of logs during bootstrap. If we do a write which gets a duplicate key error, the first time the COMMIT message is written, it includes the error. When the server restarts, it writes the COMMIT message again with only 'flushed: true' in the commit message. This is enough for bootstrap to know not to bother to replay it on subsequent restarts, but it has lost the error messages themselves. If the server restarts again, at this point it doesn't rebuild a proper response, but instead puts an errorless response into the ResultTracker. So, if an operation hits an error, and then the tablet server restarts twice while the client is still retrying, the client will falsely think that its operation has succeeded. This includes a disabled regression test which shows the bug. Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a --- M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc 2 files changed, 64 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/5417/6 -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5417 to look at the new patch set (#5). Change subject: Test for bug in exactly-once during tablet bootstrap .. Test for bug in exactly-once during tablet bootstrap Here's a regression test for the bug which is causing raft_consensus-itest to occasionally think it has inserted 23 rows when in fact it has only inserted 20. The issue is in the rewriting of logs during bootstrap. If we do a write which gets a duplicate key error, the first time the COMMIT message is written, it includes the error. When the server restarts, it writes the COMMIT message again with only 'flushed: true' in the commit message. This is enough for bootstrap to know not to bother to replay it on subsequent restarts, but it has lost the error messages themselves. If the server restarts again, at this point it doesn't rebuild a proper response, but instead puts an errorless response into the ResultTracker. So, if an operation hits an error, and then the tablet server restarts twice while the client is still retrying, the client will falsely think that its operation has succeeded. This includes a disabled regression test which shows the bug. Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a --- M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc 2 files changed, 64 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/5417/5 -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
David Ribeiro Alves has posted comments on this change. Change subject: Test for bug in exactly-once during tablet bootstrap .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5417/4/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: Line 490: #define ANFF ASSERT_NO_FATAL_FAILURE > We have NO_FATALS for this now. Done -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Adar Dembo has posted comments on this change. Change subject: Test for bug in exactly-once during tablet bootstrap .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5417/4/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: Line 490: #define ANFF ASSERT_NO_FATAL_FAILURE We have NO_FATALS for this now. -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
David Ribeiro Alves has posted comments on this change. Change subject: Test for bug in exactly-once during tablet bootstrap .. Patch Set 3: just rebased this on top of current master -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
David Ribeiro Alves has posted comments on this change. Change subject: Test for bug in exactly-once during tablet bootstrap .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5417 to look at the new patch set (#3). Change subject: Test for bug in exactly-once during tablet bootstrap .. Test for bug in exactly-once during tablet bootstrap Here's a regression test for the bug which is causing raft_consensus-itest to occasionally think it has inserted 23 rows when in fact it has only inserted 20. The issue is in the rewriting of logs during bootstrap. If we do a write which gets a duplicate key error, the first time the COMMIT message is written, it includes the error. When the server restarts, it writes the COMMIT message again with only 'flushed: true' in the commit message. This is enough for bootstrap to know not to bother to replay it on subsequent restarts, but it has lost the error messages themselves. If the server restarts again, at this point it doesn't rebuild a proper response, but instead puts an errorless response into the ResultTracker. So, if an operation hits an error, and then the tablet server restarts twice while the client is still retrying, the client will falsely think that its operation has succeeded. This includes a disabled regression test which shows the bug. Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a --- M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc 2 files changed, 67 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/5417/3 -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins