[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Alexey Serbin has removed a vote on this change. Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Patch Set 3: Verified+1 unrelated java test failure -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 07:32:50 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: allow aborting after beginning to commit
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17022 ) Change subject: KUDU-2612: allow aborting after beginning to commit .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/integration-tests/txn_commit-itest.cc File src/kudu/integration-tests/txn_commit-itest.cc: http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/integration-tests/txn_commit-itest.cc@456 PS3, Line 456: // The transaction should eventually succeed, treating the deleted : // participant as committed. nit: update? http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/integration-tests/txn_commit-itest.cc@694 PS3, Line 694: const nit: constexpr? http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/integration-tests/txn_commit-itest.cc@708 PS3, Line 708: ASSERT_OK(CountRows(&num_rows)); : ASSERT_EQ(initial_row_count_, num_rows); readability nit: move this up to be right after the 'for()' cycle? http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/integration-tests/txn_commit-itest.cc@729 PS3, Line 729: LOG(INFO) << Substitute("Expecting $0 rows from $1 committed transactions", : expected_rows, num_committed_txns.load()); Is it needed for some test automation or benchmarking? Consider removing this since our tests are very chatty already because of a lot of logging output from cluster components. http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/transactions.proto File src/kudu/transactions/transactions.proto: http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/transactions.proto@22 PS3, Line 22: enum TxnStatePB { : UNKNOWN = 0; : OPEN = 1; : ABORT_IN_PROGRESS = 5; : ABORTED = 2; : COMMIT_IN_PROGRESS = 3; : FINALIZE_IN_PROGRESS = 6; : COMMITTED = 4; : } Do you mind adding a comment to show list of acceptable sequences of state changes for a transaction? http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/txn_status_manager-test.cc File src/kudu/transactions/txn_status_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/txn_status_manager-test.cc@809 PS3, Line 809: s = txn_manager_->AbortTransaction(kTxnId2, kOwner, &ts_error); : ASSERT_TRUE(s.IsIllegalState()) << s.ToString(); I'm not sure I understand this: I thought it would be possible to call this before complete CompleteCommitTransaction is called, no? http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/txn_status_manager.h@148 PS3, Line 148: abort_txn_ = TxnStatePB::ABORT_IN_PROGRESS; nit: is it acceptable to have this called when abort_txn_ is already TxnStatePB::ABORTED? If not, maybe it makes sense to add a DCHECK() to enforce such precondition? http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/txn_status_manager.h@160 PS3, Line 160: commit_timestamp_ = commit_timestamp; nit: is it acceptable for this method to be called multiple times? If not, maybe it makes sense to add a DCHECK() on kInvalidTimestamp? -- To view, visit http://gerrit.cloudera.org:8080/17022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1b6596df2db5601f7e17e528ad6dc68057b67f8 Gerrit-Change-Number: 17022 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 26 Feb 2021 07:31:57 + Gerrit-HasComments: Yes
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Hello Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17128 to look at the new patch set (#3). Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. [client-test] more robust TestRetrieveAuthzTokenInParallel I saw a flakiness in the ClientTest.TestRetrieveAuthzTokenInParallel scenario, where about 25 out of 256 runs failed when running with --stress_cpu_threads=16: src/kudu/client/client-test.cc:6923: Failure Expected: (num_reqs) < (kThreads), actual: 21 vs 20 This patch addresses the issue by: * skipping the scenario when --stress_cpu_threads is not zero * skipping the scenario when running at single-core CPU machines * limiting the number of threads up to the number of CPU cores Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 --- M src/kudu/client/client-test.cc 1 file changed, 15 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/17128/3 -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17128/2//COMMIT_MSG Commit Message: PS2: Surprisingly, it still fails: src/kudu/client/client-test.cc:6945: Failure Expected: (num_reqs) < (kThreads), actual: 5 vs 4 -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 06:13:28 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Right: I meant the case when either the participant is registered, but BEGI Right: I meant the case when a participant is registered, but BEGIN_TXN hasn't been issued (say, tablet server with corresponding TxnOpDispatcher has crashed), and corresponding client doesn't retry its write operation (e.g., it has timed out). I'm curious how to get out of such situation? BTW, TxnOpDispatcher clears its ops queue when receiving ABORT_TXN by responding with Status::Aborted() to all pending write requests. So, it doesn't sent ServiceUnavailable() when receiving ABORT_TXN and pending operations are present in its queue. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 06:04:35 + Gerrit-HasComments: Yes
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17128/1/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17128/1/src/kudu/client/client-test.cc@6912 PS1, Line 6912: kNumCPUs > nit: why not keep the name kThreads? The rest of the test seems to care mor Done http://gerrit.cloudera.org:8080/#/c/17128/1/src/kudu/client/client-test.cc@6913 PS1, Line 6913: > > Should be < Whoops, indeed. -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 05:31:26 + Gerrit-HasComments: Yes
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Hello Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17128 to look at the new patch set (#2). Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. [client-test] more robust TestRetrieveAuthzTokenInParallel I saw a flakiness in the ClientTest.TestRetrieveAuthzTokenInParallel scenario, where about 25 out of 256 runs failed when running with --stress_cpu_threads=16: src/kudu/client/client-test.cc:6923: Failure Expected: (num_reqs) < (kThreads), actual: 21 vs 20 This patch addresses the issue by skipping the scenario when running at single-core CPU machines and limiting the number of threads up to the number of CPU cores. Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 --- M src/kudu/client/client-test.cc 1 file changed, 10 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/17128/2 -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Do you mean a scenario in which we haven't successfully called BEGIN_TXN (o Right: I meant the case when either the participant is registered, but BEGIN_TXN hasn't been issued (say, tablet server with corresponding TxnOpDispatcher has crashed) and that particular client doesn't retry it's write operation because it has timed out. In that case, how is aborting such a transaction would work? -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 05:21:07 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > As a second thought, I realized I don't clearly see how we can cleanly abor Do you mean a scenario in which we haven't successfully called BEGIN_TXN (or it's still in progress)? In that case, would there still be an active dispatcher? I thought in that case ABORT_TXN and all other participant ops would just be retried, since the dispatcher would return ServiceUnavailable? -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 04:59:55 + Gerrit-HasComments: Yes
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17128/1/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17128/1/src/kudu/client/client-test.cc@6912 PS1, Line 6912: kNumCPUs nit: why not keep the name kThreads? The rest of the test seems to care more about the number of threads executing rather than about the hardware. http://gerrit.cloudera.org:8080/#/c/17128/1/src/kudu/client/client-test.cc@6913 PS1, Line 6913: > Should be < -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 04:24:28 + Gerrit-HasComments: Yes
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17128/1//COMMIT_MSG Commit Message: PS1: As for the motivation, I put this patch together because I found too much noise from the failures in this test scenario while testing my recent changes in client-test.cc in the scope of https://gerrit.cloudera.org/#/c/17037/ -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 03:56:53 + Gerrit-HasComments: Yes
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17128 Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. [client-test] more robust TestRetrieveAuthzTokenInParallel I saw a flakiness in the ClientTest.TestRetrieveAuthzTokenInParallel scenario, where about 25 out of 256 runs failed when running with --stress_cpu_threads=16: src/kudu/client/client-test.cc:6923: Failure Expected: (num_reqs) < (kThreads), actual: 21 vs 20 This patch addresses the issue by skipping the scenario when running at single-core CPU machines and limiting the number of threads up to the number of CPU cores. Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 --- M src/kudu/client/client-test.cc 1 file changed, 12 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/17128/1 -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 ) Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. Patch Set 3: Verified+1 Unrelated failure of HmsConfigurations/MasterFailoverTest.TestMasterUUIDResolution/1 -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 03:22:25 + Gerrit-HasComments: No
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Andrew Wong has removed a vote on this change. Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [client-test] added WriteWhileRestartingMultipleTabletServers
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17120 ) Change subject: [client-test] added WriteWhileRestartingMultipleTabletServers .. [client-test] added WriteWhileRestartingMultipleTabletServers While working on tests for multi-row transactions, I added a similar scenario and found it a bit flaky. So, I wanted to see if the same is true for the non-transactional write operations. It turns out the latter is pretty stable, so I need to dig in to find the root cause of the former. Anyways, I think this is a good scenario to add into client-test.cc, extending already existing scenario ClientTest.TestWriteWhileRestarting to multi-replica case and going through multiple restarts. Another important detail is that the newly added test scenario verifies the number of persisted rows in the end. I also did a small touch-up of the code related to the utility method ClientTest::CountRowsFromClient(). Change-Id: I95d1456dea2e6e2bb7d8b0c5d05e95798098710d Reviewed-on: http://gerrit.cloudera.org:8080/17120 Tested-by: Alexey Serbin Reviewed-by: Andrew Wong --- M src/kudu/client/client-test.cc 1 file changed, 105 insertions(+), 42 deletions(-) Approvals: Alexey Serbin: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I95d1456dea2e6e2bb7d8b0c5d05e95798098710d Gerrit-Change-Number: 17120 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [client-test] added WriteWhileRestartingMultipleTabletServers
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17120 ) Change subject: [client-test] added WriteWhileRestartingMultipleTabletServers .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17120/1/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17120/1/src/kudu/client/client-test.cc@7912 PS1, Line 7912: for (auto mode : {KuduScanner::READ_LATEST, KuduScanner::READ_YOUR_WRITES}) { > At first I thought this could lead to flakiness because there could be a le Yes, there isn't any flakiness: it's LEADER_ONLY replica selection. If that's a former leader becomes stale, it has all the written rows because the same client is used to scan the rows as was use to write them. As for the original txn-related issue mentioned in the description of this patch, I guess the issue should be fixed once https://gerrit.cloudera.org/#/c/17037/10/src/kudu/integration-tests/txn_write_ops-itest.cc@1319 passes (I hope https://gerrit.cloudera.org/#/c/17127/ should fix it). -- To view, visit http://gerrit.cloudera.org:8080/17120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95d1456dea2e6e2bb7d8b0c5d05e95798098710d Gerrit-Change-Number: 17120 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 03:09:38 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN As a second thought, I realized I don't clearly see how we can cleanly abort a transaction which faltered at one of its participants and doesn't have metadata about started transaction if ABORT_TXN returns ILLEGAL_STATE as well. Is it going to be addressed in a separate patch or https://gerrit.cloudera.org/#/c/17022/ will take care of this as well? -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 03:00:25 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: Code-Review+2 (1 comment) Thank you for fixing this! http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@248 PS1, Line 248: TestTransactionNotFound nit: does it makes sense to rename this? Or, maybe, at least add some text blurb to explain what these sub-scenarios mean in the bigger picture -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 02:50:59 + Gerrit-HasComments: Yes
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Hello Attila Bukor, Kudu Jenkins, Grant Henke, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17124 to look at the new patch set (#3). Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING Prior to this patch, if a tablet server were quiescing for a prolonged period, scan requests could time out, complaining that the tablet server is quiescing, but without ever retrying the scan at another tablet server. This is because tablet servers will return TABLET_NOT_RUNNING to clients when attempting a scan while quiescing. The behavior in the C++ client is that the location is then blacklisted and the request is retried elsewhere. The behavior in the Java client, though, is that the same location is retried until failure. This patch addresses this by treating TABLET_NOT_RUNNING errors in the Java client as we would for TABLET_NOT_FOUND, which is actually quite similar to the handling for TABLET_NOT_RUNNING in the C++ client: the location is invalidated for further attempts, and the request is retried elsewhere. Why not just have quiescing tablet servers return TABLET_NOT_FOUND, then? TABLET_NOT_FOUND errors in the C++ client actually have some behavior not present in the Java client: a tablet whose location is invalidated with TABLET_NOT_FOUND in the C++ client will be required to be looked up again, requiring a round trip to the master. This behavior doesn't exist in the Java client, so I thought it easiest to piggyback on TABLET_NOT_FOUND handling for now. Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 --- M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java 3 files changed, 57 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/3 -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [client-test] added WriteWhileRestartingMultipleTabletServers
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17120 ) Change subject: [client-test] added WriteWhileRestartingMultipleTabletServers .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/17120/1/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17120/1/src/kudu/client/client-test.cc@7912 PS1, Line 7912: for (auto mode : {KuduScanner::READ_LATEST, KuduScanner::READ_YOUR_WRITES}) { At first I thought this could lead to flakiness because there could be a leadership change in between writing and reading, and we could end up scanning a stale leader. I don't think that's actually the case though, since if we're scanning a "stale follower", it must have changed leadership after having written all rows successfully, since we're using the same client that wrote the rows. -- To view, visit http://gerrit.cloudera.org:8080/17120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95d1456dea2e6e2bb7d8b0c5d05e95798098710d Gerrit-Change-Number: 17120 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 01:57:22 + Gerrit-HasComments: Yes
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Hello Attila Bukor, Kudu Jenkins, Grant Henke, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17124 to look at the new patch set (#2). Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING Prior to this patch, if a tablet server were quiescing for a prolonged period, scan requests could time out, complaining that the tablet server is quiescing, but without ever retrying the scan at another tablet server. This is because tablet servers will return TABLET_NOT_RUNNING to clients when attempting a scan while quiescing. The behavior in the C++ client is that the location is then blacklisted and the request is retried elsewhere. The behavior in the Java client, though, is that the same location is retried until failure. This patch addresses this by treating TABLET_NOT_RUNNING errors in the Java client as we would for TABLET_NOT_FOUND, which is actually quite similar to the handling for TABLET_NOT_RUNNING in the C++ client: the location is invalidated for further attempts, and the request is retried elsewhere. Why not just have quiescing tablet servers return TABLET_NOT_FOUND, then? TABLET_NOT_FOUND errors in the C++ client actually have some behavior not present in the Java client: a tablet whose location is invalidated with TABLET_NOT_FOUND in the C++ client will be required to be looked up again, requiring a round trip to the master. This behavior doesn't exist in the Java client, so I thought it easiest to piggyback on TABLET_NOT_FOUND handling for now. Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 --- M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java 3 files changed, 57 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/2 -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16752 ) Change subject: tablet: allow interleaving of row liveness between compaction input rows .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/16752/6/src/kudu/integration-tests/fuzz-itest.cc File src/kudu/integration-tests/fuzz-itest.cc: http://gerrit.cloudera.org:8080/#/c/16752/6/src/kudu/integration-tests/fuzz-itest.cc@2044 PS6, Line 2044: F(FuzzTest, TestDo > nit: is it crucial to have INSERT_INGORE instead of INSERT here? Do you mi Most of these cases were found by random fuzzing, and I just happened to hit a crash with INSERT_IGNORE. INSERT works too, so I'll update it to be less conspicuous. http://gerrit.cloudera.org:8080/#/c/16752/6/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/16752/6/src/kudu/tablet/compaction.cc@366 PS6, Line 366: nst Mutation* right_last = right. > Here and below: is AdvanceToLastInList really supposed to be under the #ifn Yeah, it's only necessary for the DCHECK. If we're not running the validation, there's no point in iterating. http://gerrit.cloudera.org:8080/#/c/16752/8/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/16752/8/src/kudu/tablet/compaction.cc@369 PS8, Line 369: if (PREDICT_TRUE(FLAGS_dcheck_on_kudu_2233_invariants)) { > nit: if this validation is empty in NDEBUG case, maybe put the whole 'if()' Good point. Done -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 26 Feb 2021 00:59:48 + Gerrit-HasComments: Yes
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16752 to look at the new patch set (#9). Change subject: tablet: allow interleaving of row liveness between compaction input rows .. tablet: allow interleaving of row liveness between compaction input rows It was previously true that when merging multiple rowsets, a row's liveness always moved to the same or newer rowsets. I.e., if a row was deleted and inserted again, the new insert would either land in the same rowset from which it was deleted (e.g. if deleting from the MRS and inserting back into the same MRS), or a rowset whose rows are newer than the rowset being deleted from (e.g. if deleting from a DRS and inserting into the MRS). This invariant was upheld by various checks in compaction code. The invariant is no longer true when considering transactional inserts. Take the following example: - ts=10: insert 'a' to the tablet's MRS - ts=11: delete 'a' from the tablet's MRS - begin txn - insert 'a' to a transactional MRS - ts=12: commit the transactional MRS - ts=13: delete 'a' from the transactional MRS - ts=14: insert 'a' to the tablet's MRS In considering the row history for 'a' in the context of defining the row's history, we're left with the following row histories in each rowset, which serve as an input to our history-merging code: tablet MRS: UNDO(del@10) <- 'a' -> REDO(del@11) -> REDO(reins@14) txn MRS:UNDO(del@12) <- 'a' -> REDO(del@13) Previously, our invariant meant that one of these rows entire history (both UNDOs and REDOs) was entirely ahead of the other. This meant that merging was a matter of determining which input is older (which must be a deleted "ghost" row, since there can only be a single live row at a time), converting the older row and its history into UNDOs, and merging that UNDO history with the newer row's UNDOs. Given the above sequence, defining an older row isn't as straightforward, given the overlapping of the histories. This led to broken invariants in the form of CHECK failures or incorrect results. However, a notable obvservation is that there _is_ a discernable history that does abide by our previously held invariant, if we transfer the newer REDOs from the tablet's MRS input to the txn's MRS input: UNDO(del@10) <- 'a' -> REDO(del@11) UNDO(del@12) <- 'a' -> REDO(del@13) -> REDO(reins@14) This transferral is safe because we still expect that only a single instance of a row can be "live" at a time. I.e., if there is such overlap, it is caused by the deletion of a row from the older rowset, and in transferring the history, there is no risk of ending up with two histories for the same row that both end with a live row. This patch implements this transferral of history onto the newer input, and adds some fuzz test cases that helped snuff out the aforementioned CHECK failures and incorrect results. It also enables transactional ops in fuzz-itest, which helped find this issue. Without this patch, fuzz-itest with transactional support fails 2/10 times in DEBUG mode. With it, it has passed 1000/1000 times. Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a --- M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/mutation.h 5 files changed, 612 insertions(+), 83 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16752/9 -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] test: add more natural test for KUDU-2233
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77 PS2, Line 77: DEFINE_bool(dcheck_on_kudu_2233_invariants, true > I can surround this with some ifndef sure, but we do have several test-only Left this as is for now. http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@78 PS2, Line 78: caused by KUDU-2233 > nit: maybe specify this is test only as well. Done -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 00:53:30 + Gerrit-HasComments: Yes
[kudu-CR] test: add more natural test for KUDU-2233
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17114 to look at the new patch set (#3). Change subject: test: add more natural test for KUDU-2233 .. test: add more natural test for KUDU-2233 I have a patch in-flight that touches an area of the code around where we expect the infamous KUDU-2233 crash. Before merging it, I'd like to ensure the graceful handling of this corruption is unaffected, especially in cases where data has previously been corrupted and we've just upgraded to a newer version of Kudu. This patch adds such a test case, where data is corrupted but does not result in a crash, and the tserver is restarted with bits that handle corruption gracefully. In doing so, this patch also adds an off switch to all of the guardrails we installed for KUDU-2233. Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa --- M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/integration-tests/timestamp_advancement-itest.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_replica.cc 7 files changed, 160 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/17114/3 -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has removed Anonymous Coward (314) from this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Removed reviewer null. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Hello Alexey Serbin, Anonymous Coward (314), I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/17127 to review the following change. Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run In testing an in-flight patch[1], it was found that our current handling of errors when participant registration hasn't completed can lead to incorrect behavior: the path in which we handle NOT_FOUND errors while committing expects that such errors only occur if the tablet has been deleted, and we currently proceed with the commit. The incorrect assumption here was that NOT_FOUND errors are only produced when the tablet has been deleted, which is not the case. This behavior will be amended in an upcoming patch[2], and we'll actually abort the transaction on NOT_FOUND errors. Until then, this patch adjust the behavior to return ILLEGAL_STATE instead of NOT_FOUND, so at least the error type is consistent with the rest of the participant op lifecycle errors. [1] https://gerrit.cloudera.org/c/17037/ [2] https://gerrit.cloudera.org/c/17022 Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d --- M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.h 2 files changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/17127/1 -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (314)
[kudu-CR] [test][master] KUDU-3249 Recover dead master at the same HostPort
Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17126 Change subject: [test][master] KUDU-3249 Recover dead master at the same HostPort .. [test][master] KUDU-3249 Recover dead master at the same HostPort This change uses the Add/Remove master primitives added earlier to simulate recovery for a dead master at the same HostPort but different uuid. Test only code change but involves bunch of refactoring to re-use parts of existing test. Change-Id: Iac65201fd39ede5e918025ca7cf6a852a92d6eec --- M src/kudu/master/dynamic_multi_master-test.cc 1 file changed, 308 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/17126/1 -- To view, visit http://gerrit.cloudera.org:8080/17126 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iac65201fd39ede5e918025ca7cf6a852a92d6eec Gerrit-Change-Number: 17126 Gerrit-PatchSet: 1 Gerrit-Owner: Bankim Bhavsar
[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17037 to look at the new patch set (#10). Change subject: KUDU-2612 tablet servers automatically register txn participants .. KUDU-2612 tablet servers automatically register txn participants With this patch, tablet servers automatically register tablets as transaction participants and issue appropriate BEGIN_TXN operations upon receiving write operations targeting tablet replicas which they host. Internally, the newly introduced logic is mostly embedded into the TxnOpDispatcher class. Every TxnOpDispatcher object is allowed to accumulate up to a certain number of pending write requests in its queue while awaiting for the completion of the preliminary tasks mentioned above. Once the TxnOpDispatcher's queue is at capacity, a tablet server starts responding with the ErrorStatusPB::ERROR_SERVER_TOO_BUSY error code to incoming write requests in the context of the corresponding transaction, and Kudu clients automatically retry requests, as expected. The capacity of the TxnOpDispatcher's queue is controlled by a newly introduced --tablet_max_pending_txn_write_ops flag. By default, the flag is set to 2. Buffering a few write requests should help to avoid needless retries in case if a client packs many operations into one write request: that's exactly the behavior for Kudu client sessions using the AUTO_FLUSH_BACKGROUND mode. If such buffering isn't desired for some reason, set --tablet_max_pending_txn_write_ops=0: in that case a client will retry the very first operation sent to a tablet server in the context of a transaction until the tablet server completes the preliminary tasks mentioned above. The flag has runtime semantics, so no restart of a tablet server is required upon modification of the flag's setting. Each tablet replica maintains a txn_id --> TxnOpDispatcher map, with an entry's lifecycle as below: * An entry is added upon receiving write request in the context of a multi-row transaction. * An entry is removed upon applying either ParticipantOpPB::ABORT_TXN or ParticipantOpPB::FINALIZE_COMMIT operation. * If a write request is received after transaction has been committed or aborted, the entry is automatically removed once receiving corresponding error response from any of the following components: ** from TxnStatusManager in at attempts to register a participant in the context of committed/aborted transaction ** from the replica itself in an attempt to add ParticipantOpPB::BEGIN_COMMIT operation In other words, the system automatically gets rid of no-longer-needed TxnOpDispatcher entries. This patch also contains several test scenarios to cover the newly introduced functionality. I also updated other related tests to remove the artificial registration of corresponding transaction participants, so those tests now rely on the newly introduced functionality. Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 --- M src/kudu/client/batcher.cc M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc A src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 12 files changed, 2,317 insertions(+), 126 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/17037/10 -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 10 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17124 Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING Prior to this patch, if a tablet server were quiescing for a prolonged period, scan requests could time out, complaining that the tablet server is quiescing, but without ever retrying the scan at another tablet server. This is because tablet servers will return TABLET_NOT_RUNNING to clients when attempting a scan while quiescing. The behavior in the C++ client is that the location is then blacklisted and the request is retried elsewhere. The behavior in the Java client, though, is that the same location is retried until failure. This patch addresses this by treating TABLET_NOT_RUNNING errors in the Java client as we would for TABLET_NOT_FOUND, which is actually quite similar to the handling for TABLET_NOT_RUNNING in the C++ client: the location is invalidated for further attempts, and the request is retried elsewhere. Why not just have quiescing tablet servers return TABLET_NOT_FOUND, then? TABLET_NOT_FOUND errors in the C++ client actually have some behavior not present in the Java client: a tablet whose location is invalidated with TABLET_NOT_FOUND in the C++ client will be required to be looked up again, requiring a round trip to the master. This behavior doesn't exist in the Java client, so I thought it easiest to piggyback on TABLET_NOT_FOUND handling for now. Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 --- M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java 3 files changed, 57 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/1 -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong
[kudu-CR] KUDU-2612: allow aborting after beginning to commit
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17022 ) Change subject: KUDU-2612: allow aborting after beginning to commit .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17022/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17022/3//COMMIT_MSG@10 PS3, Line 10: FINALIZE_IN_PROGRESS state to serve as an intermediate step between : COMMIT_IN_PROGRESS Can you briefly describe the difference between COMMIT_IN_PROGRESS and FINALIZE_IN_PROGRESS. e.g. COMMIT_IN_PROGRESS covers the process from BeginCommit to Finalize. And FINALIZE_IN_PROGRESS covers the process from Finalize to Committed? Also for my own understanding once in FINALIZE_IN_PROGRESS state the transaction cannot be aborted? http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/17022/3/src/kudu/transactions/txn_status_manager.h@155 PS3, Line 155: DCHECK_EQ(TxnStatePB::OPEN, abort_txn_); I don't quite understand why not check TxnStatePB::ABORT_IN_PROGRESS instead of OPEN, Since ABORT_IN_PROGRESS is the previous state of Aborted? -- To view, visit http://gerrit.cloudera.org:8080/17022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1b6596df2db5601f7e17e528ad6dc68057b67f8 Gerrit-Change-Number: 17022 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 25 Feb 2021 19:57:09 + Gerrit-HasComments: Yes
[kudu-CR] [thirdparty] Update thirdparty licenses
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17123 ) Change subject: [thirdparty] Update thirdparty licenses .. [thirdparty] Update thirdparty licenses With some recent changes to the gtest and curl builds in thirdparty there are some license changes that should be made. I noticed these required changes as a result of a failing kudu-binary jar license check. Change-Id: I70cfc5b524e6ad0dc9a592f6728ff9f65752ff70 Reviewed-on: http://gerrit.cloudera.org:8080/17123 Tested-by: Kudu Jenkins Reviewed-by: Bankim Bhavsar --- M thirdparty/LICENSE.txt 1 file changed, 3 insertions(+), 3 deletions(-) Approvals: Kudu Jenkins: Verified Bankim Bhavsar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I70cfc5b524e6ad0dc9a592f6728ff9f65752ff70 Gerrit-Change-Number: 17123 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16752 ) Change subject: tablet: allow interleaving of row liveness between compaction input rows .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16752/2/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/16752/2/src/kudu/tablet/compaction.cc@453 PS2, Line 453: / If the last chang > We discussed this on Slack, that I don't think it's a valid sequence in ter Ack -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 25 Feb 2021 19:00:49 + Gerrit-HasComments: Yes
[kudu-CR] [thirdparty] Update thirdparty licenses
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17123 ) Change subject: [thirdparty] Update thirdparty licenses .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/17123/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17123/1//COMMIT_MSG@11 PS1, Line 11: failing kudu-binary jar : license check. > It runs as a part of the mini-cluster build (aka kudu-binary jar build). Ack -- To view, visit http://gerrit.cloudera.org:8080/17123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70cfc5b524e6ad0dc9a592f6728ff9f65752ff70 Gerrit-Change-Number: 17123 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 25 Feb 2021 18:46:39 + Gerrit-HasComments: Yes
[kudu-CR] test: add more natural test for KUDU-2233
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. Patch Set 2: Code-Review+1 (1 comment) LGTM, thanks a lot for providing this patch test upgrade cases. http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@78 PS2, Line 78: caused by KUDU-2233 nit: maybe specify this is test only as well. -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 25 Feb 2021 18:45:36 + Gerrit-HasComments: Yes
[kudu-CR] [thirdparty] Update thirdparty licenses
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17123 ) Change subject: [thirdparty] Update thirdparty licenses .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17123/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17123/1//COMMIT_MSG@11 PS1, Line 11: failing kudu-binary jar : license check. > How to run this license check and is it something that's run as part of som It runs as a part of the mini-cluster build (aka kudu-binary jar build). ./build-support/mini-cluster/build_mini_cluster_binaries.sh -- To view, visit http://gerrit.cloudera.org:8080/17123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70cfc5b524e6ad0dc9a592f6728ff9f65752ff70 Gerrit-Change-Number: 17123 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 25 Feb 2021 17:50:37 + Gerrit-HasComments: Yes
[kudu-CR] [thirdparty] Update thirdparty licenses
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17123 ) Change subject: [thirdparty] Update thirdparty licenses .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/17123/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17123/1//COMMIT_MSG@11 PS1, Line 11: failing kudu-binary jar : license check. How to run this license check and is it something that's run as part of some job? -- To view, visit http://gerrit.cloudera.org:8080/17123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70cfc5b524e6ad0dc9a592f6728ff9f65752ff70 Gerrit-Change-Number: 17123 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 25 Feb 2021 17:49:18 + Gerrit-HasComments: Yes
[kudu-CR] [thirdparty] Update thirdparty licenses
Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17123 Change subject: [thirdparty] Update thirdparty licenses .. [thirdparty] Update thirdparty licenses With some recent changes to the gtest and curl builds in thirdparty there are some license changes that should be made. I noticed these required changes as a result of a failing kudu-binary jar license check. Change-Id: I70cfc5b524e6ad0dc9a592f6728ff9f65752ff70 --- M thirdparty/LICENSE.txt 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/17123/1 -- To view, visit http://gerrit.cloudera.org:8080/17123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I70cfc5b524e6ad0dc9a592f6728ff9f65752ff70 Gerrit-Change-Number: 17123 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke