[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@714 PS9, Line 714: orig_leader > nit: I guess this isn't necessary once the ASSERT_EVENTUALLY() below is gon Done http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@715 PS9, Line 715: TONED, / > nit: just put 3 here? Address it in a follow up patch. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457 PS9, Line 1457: nStatusM > Because the l.first_failed_status().ok() most likely will return not-OK for Hmm, makes sense. I will address it in a follow up commit. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 16 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 01 Feb 2021 22:57:56 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@714 PS9, Line 714: orig_leader nit: I guess this isn't necessary once the ASSERT_EVENTUALLY() below is gone. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@715 PS9, Line 715: TONED, / nit: just put 3 here? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457 PS9, Line 1457: nStatusM > I am not sure why we want to do that? Why don't we check for the rest table Because the l.first_failed_status().ok() most likely will return not-OK for the rest of the tables: l.first_failed_status().ok() doesn't depend on the elements in the container, but rather on the status of the replica behind this TxnStatusManager. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 16 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 31 Jan 2021 17:11:34 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Reviewed-on: http://gerrit.cloudera.org:8080/16648 Tested-by: Hao Hao Reviewed-by: Andrew Wong --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 17 files changed, 850 insertions(+), 140 deletions(-) Approvals: Hao Hao: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 16 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774 PS15, Line 774: FLAGS_txn_status_tablet_inject_load_failure_error = true; > You mentioned on slack that this won't do io unless we flush. It's probably Ack -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 31 Jan 2021 06:53:16 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 15: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774 PS15, Line 774: FLAGS_txn_status_tablet_inject_load_failure_error = true; > You can also use --env_inject_eio_globs to target specific directories. IIR You mentioned on slack that this won't do io unless we flush. It's probably more trouble than it's worth to use io errors here. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 31 Jan 2021 06:40:36 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774 PS15, Line 774: FLAGS_txn_status_tablet_inject_load_failure_error = true; > Try to use --env_inject_eio but that seems to affect the log to be appended You can also use --env_inject_eio_globs to target specific directories. IIRC, each mini server stores all of its data in a single directory, so something like --env_inject_eio_globs={cluster_->mini_tablet_server(i)->server()->fs_manager()->GetDataRootDirs()[0]}/** -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 31 Jan 2021 01:18:11 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774 PS15, Line 774: FLAGS_txn_status_tablet_inject_load_failure_error = true; > Do disk failures exercise this codepath? If so, we can just use --env_injec Try to use --env_inject_eio but that seems to affect the log to be appended. Or maybe you have suggestions on how to narrow down to only affect transaction status tablet? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 31 Jan 2021 01:13:00 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 15: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774 PS15, Line 774: FLAGS_txn_status_tablet_inject_load_failure_error = true; Do disk failures exercise this codepath? If so, we can just use --env_inject_eio. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 31 Jan 2021 00:58:28 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 15: Verified+1 Unrelated org.apache.kudu.client.TestKuduMetrics flaky test failure. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 31 Jan 2021 00:31:39 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has removed a vote on this change. Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 15: (2 comments) http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc File src/kudu/integration-tests/txn_status_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc@447 PS14, Line 447: ASSERT_EVENTUALLY([&]{ > nit: Why do we need to sleep for so long if we're already retrying? Done http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443 PS14, Line 443: : const int64_t term = consensus.CurrentTerm(); : // If the term has changed we assume the new leader is about to do the : // necessary work in its > Why might we get here? I thought we didn't want to crash at all? Can we get Good point! It is overkilled to crash here if there is any failures preventing read from the transaction status tablet. Updated. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 30 Jan 2021 23:51:19 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#15). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 17 files changed, 850 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/15 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 14: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443 PS14, Line 443: : // In all other cases non-OK status is considered fatal. : LOG(FATAL) << Substitute("$0 failed: $1", op_description, s.ToString()); : __builtin_unreachable(); > Why might we get here? I thought we didn't want to crash at all? Can we get Actually I'd be ok addressing this as a follow-up. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 14 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 30 Jan 2021 20:38:09 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc File src/kudu/integration-tests/txn_status_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc@447 PS14, Line 447: SleepFor(MonoDelta::FromMilliseconds(5 * keepalive_interval_ms)); nit: Why do we need to sleep for so long if we're already retrying? http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443 PS14, Line 443: : // In all other cases non-OK status is considered fatal. : LOG(FATAL) << Substitute("$0 failed: $1", op_description, s.ToString()); : __builtin_unreachable(); Why might we get here? I thought we didn't want to crash at all? Can we get here if there's a disk failure or somesuch? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 14 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 30 Jan 2021 19:53:39 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 14: (4 comments) http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc@818 PS13, Line 818: class TxnStatusTableElectionStormITest : public TxnStatusTableITest { > Is this guaranteed to happen? If we need to change leadership, quiescing mi Done http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/master/txn_manager-test.cc File src/kudu/master/txn_manager-test.cc: PS13: > Why the changes in this file? Came across this code and saw some nits can be improved (but I guess also introduced some redundant dependency). Will revert it back. http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@214 PS13, Line 214: PREDICT_FALSE(!replica_status_.ok() || : FLAGS_txn_status_tablet_inject_uninitialized_leader_status_error) > nit: wrap in a single PREDICT_FALSE Done http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@400 PS13, Line 400: PREDICT_FALSE(!s.ok() || > nit: maybe wrap the entire thing in PREDICT_FALSE instead of just the flag Done -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 14 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 30 Jan 2021 08:21:54 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#14). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 17 files changed, 833 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/14 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 14 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc@818 PS13, Line 818: // Make sure a re-election happens before the following read. Is this guaranteed to happen? If we need to change leadership, quiescing might be an approach. Look around for 'mutable_quiescing()' in some existing tests. http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/master/txn_manager-test.cc File src/kudu/master/txn_manager-test.cc: PS13: Why the changes in this file? http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@214 PS13, Line 214: PREDICT_FALSE(!replica_status_.ok()) || : PREDICT_FALSE(FLAGS_txn_status_tablet_inject_uninitialized_leader_status_error) nit: wrap in a single PREDICT_FALSE http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@400 PS13, Line 400: !s.ok() || PREDICT_FALSE nit: maybe wrap the entire thing in PREDICT_FALSE instead of just the flag -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 13 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 29 Jan 2021 08:43:43 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 13: Verified+1 Unrelated flaky test ExternalMiniClusterTest.TestBasicOperation -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 13 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 29 Jan 2021 06:52:16 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has removed a vote on this change. Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 13 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#13). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 825 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/13 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 13 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 12: (10 comments) http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@329 PS11, Line 329: transaction status table and is > nit: just "transaction status table" Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@332 PS11, Line 332: More concretely, if this returns : // 'true', callers must call DecreaseTxnCoordinatorTaskCounter(). : // > nit: more concretely, if this returns 'true', callers must call DecreaseTxn Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@336 PS11, Line 336: > nit: these methods don't run anything, so how about "if the caller should r Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@340 PS11, Line 340: transaction status table : // and is > nit: here too Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc@1010 PS11, Line 1010: std::l > nit: could use DCHECK_GE Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h@100 PS11, Line 100: 'txn_coordinator' must be a 'TxnStatusManager*' because of the : // downcast in the initia > nit: more specifically, 'txn_coordinator' must be a TxnStatusManager* becau Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83 PS6, Line 83: false, : "Finalize commit > Sounds good. Curious if we have test coverage for this. Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@239 PS11, Line 239: > Do we have tests for this? Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@356 PS11, Line 356: > nit: why not put this directly in the if() condition? Same with other insta Done http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@358 PS11, Line 358: t get an unexpected error response and bail instead > nit: this might not be that useful. It'll be obvious that the tablet is shu Done -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 12 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 29 Jan 2021 02:32:45 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#12). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 825 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/12 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 12 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 11: (10 comments) Overall looks good, just more nits and a couple of test suggestions. http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@329 PS11, Line 329: transaction status manager table nit: just "transaction status table" http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@332 PS11, Line 332: Thus, the caller needs to call : // 'DecreaseTxnCoordinatorTaskCounter' after the task is executed or : // aborted to unblock the tablet shutting down process. nit: more concretely, if this returns 'true', callers must call DecreaseTxnCoordinatorTaskCounter(). Same below. http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@336 PS11, Line 336: successfully nit: these methods don't run anything, so how about "if the caller should run the staleness task" Same below http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@340 PS11, Line 340: transaction status manager : // table nit: here too http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc@1010 PS11, Line 1010: DCHECK nit: could use DCHECK_GE http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h@100 PS11, Line 100: The expect type of 'txn_coordinator' is a subclass of TxnCoordinator, : // e.g. TxnStatusManager. nit: more specifically, 'txn_coordinator' must be a TxnStatusManager* because of the down_cast, no? http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83 PS6, Line 83: ing all operations replicated " : "by the previou > After thinking more, I decide to remove the crash logic and just return, si Sounds good. Curious if we have test coverage for this. http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@239 PS11, Line 239: TABLET_NOT_RUNNING); Do we have tests for this? http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@356 PS11, Line 356: status_tablet_.tablet_replica_->IsShuttingDown(); nit: why not put this directly in the if() condition? Same with other instances with task_enabled http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@358 PS11, Line 358: "The tablet is already shutting down or shutdown. "; nit: this might not be that useful. It'll be obvious that the tablet is shutting down from surrounding messages. This should probably just say "Not loading TxnStatusManager" or something -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 21 Jan 2021 07:24:11 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 11: Verified+1 TSAN build failure doesn't seem to relate to the patch. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 19 Jan 2021 18:06:06 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has removed a vote on this change. Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#11). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 766 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/11 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 11 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 9: (7 comments) > Patch Set 9: > > (7 comments) > > IWYU isn't happy yet: > > >>> Fixing #includes in > >>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc' > @@ -32,6 +32,7 @@ > #include "kudu/common/wire_protocol.h" > #include "kudu/consensus/metadata.pb.h" > #include "kudu/consensus/raft_consensus.h" > +#include "kudu/gutil/casts.h" > #include "kudu/gutil/macros.h" > #include "kudu/gutil/map-util.h" > #include "kudu/gutil/port.h" > IWYU would have edited 1 files on your behalf. Thanks a lot for looking into the failure! http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304 PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished(); > How do we protect against the race when right after this 'txn_coordinator_t Are you asking after L304 which we waited 'txn_coordinator_task_counter_' to become zero. and then it again becomes non-zero and tablet is shutdown state. what happens if TxnStatusManager tries to access the tablet? In that case, the task will not be enabled/ran when the tablet is shutting down. So that ShouldRunTxnCoordinatorStalenessTask() will return false and no new tasks will run. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471 PS9, Line 471: if (state_ == STOPPING || state_ == STOPPED) { : return false; : } : return true; > This looks a bit strange: it returns 'true' even if 'state_' is RUNNING? Ah, good catch. Sorry it is the opposite meaning as what the method name implies here. Adjust the caller as well. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997 PS9, Line 997: state_ > What about SHUTDOWN state? SHUTDOWN is only set after the tablet is fully stopped. Which the raft consensus has been stopped and the call back should no longer be triggered. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009 PS9, Line 1009: txn_coordinator_task_counter_ > nit: is it expected that this counter goes negative? If not, maybe add CHE Done http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101 PS9, Line 101: txn_coordinator > nit: it would be great to add a comment about the expectations about the ty Done http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439 PS9, Line 1439: !r->CheckRunning().ok() > Does it mean that the TxnStalenessTrackerTask() will cease to run at this t Hmm, yeah, I guess we wouldn't want to stop the TxnStalenessTrackerTask() if some txn status tablets is shutdown. We may want to resume on checking the other tablets. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457 PS9, Line 1457: continue > Maybe, we should jump not just to the next element 'replicas' container, bu I am not sure why we want to do that? Why don't we check for the rest tablets? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 19 Jan 2021 01:15:04 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#10). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 766 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/10 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 10 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 9: (7 comments) IWYU isn't happy yet: >>> Fixing #includes in >>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc' @@ -32,6 +32,7 @@ #include "kudu/common/wire_protocol.h" #include "kudu/consensus/metadata.pb.h" #include "kudu/consensus/raft_consensus.h" +#include "kudu/gutil/casts.h" #include "kudu/gutil/macros.h" #include "kudu/gutil/map-util.h" #include "kudu/gutil/port.h" IWYU would have edited 1 files on your behalf. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304 PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished(); How do we protect against the race when right after this 'txn_coordinator_task_counter_' becomes non-zero again and tablet_->Shutdown() is called when TxnStatusManager tries to access the tablet? Should we just nullify txn_coordinator_ in WaitUntilTxnCoordinatorTasksFinished() or set another variable that controls the return status of ShouldRunTxnCoordinatorStalenessTask() along with 'state_'? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471 PS9, Line 471: if (state_ == STOPPING || state_ == STOPPED) { : return false; : } : return true; This looks a bit strange: it returns 'true' even if 'state_' is RUNNING? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997 PS9, Line 997: state_ What about SHUTDOWN state? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009 PS9, Line 1009: txn_coordinator_task_counter_ nit: is it expected that this counter goes negative? If not, maybe add CHECK()/DCHECK() to enforce the invariant here? http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101 PS9, Line 101: txn_coordinator nit: it would be great to add a comment about the expectations about the type of this parameter. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439 PS9, Line 1439: !r->CheckRunning().ok() Does it mean that the TxnStalenessTrackerTask() will cease to run at this tablet server once one of its txn status tablets is shutdown (e.g., moved by other server by the rebalancer tool)? I'm not sure that's the behavior we want. http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457 PS9, Line 1457: continue Maybe, we should jump not just to the next element 'replicas' container, but onto next iteration of the outer 'while (true)' loop? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 16 Jan 2021 06:10:15 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc@426 PS6, Line 426: dynamic_cast > nit: I guess we prefer to use down_cast<> for this type of casts in the cod Done http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc@1262 PS7, Line 1262: dynamic_cast > nit: use down_cast<> here as well? Done http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453 PS7, Line 1453: TxnStatusManager* txn_status_manager = : dynamic_cast(coordinator); : TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager); > Sounds good to me, will update it. Done -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 15 Jan 2021 05:19:06 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#9). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 760 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/9 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 9 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453 PS7, Line 1453: TxnStatusManager* txn_status_manager = : dynamic_cast(coordinator); : TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager); > The rationale is to prevent code duplication: as I could see, all call site Sounds good to me, will update it. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 14 Jan 2021 21:02:44 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453 PS7, Line 1453: TxnStatusManager* txn_status_manager = : dynamic_cast(coordinator); : TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager); > Sounds doable for me, but wondering what is the rationale behind to perform The rationale is to prevent code duplication: as I could see, all call sites which use TxnStatusManager::ScopedLeaderSharedLock contain this dynamic_cast piece, so why not to provide a constructor with proper signature. Feel free to ignore this if you think it's not worth it. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 7 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 14 Jan 2021 20:42:00 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@725 PS3, Line 725: FLAGS_leader_failure_max_missed_heartbeat_periods = 1; : FLAGS_raft_heartbeat_interval_ms = 30; > The short heartbeat interval combined with just single missing heartbeat pe Yes, when I looped I was using --stress_cpu_threads=16. http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453 PS7, Line 1453: TxnStatusManager* txn_status_manager = : dynamic_cast(coordinator); : TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager); > Looking at this again and again prompts me to ask the following question: I Sounds doable for me, but wondering what is the rationale behind to perform the downcast under the hood vs. doing it here? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 14 Jan 2021 19:56:43 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#8). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 760 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/8 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 7: (4 comments) Yep, I agree with Andrew: after quick looks this looks much better to me now! I hope to get one more pass tomorrow morning. As for now, just a few nits not related to the essence, but rather syntax and code conventions. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc@426 PS6, Line 426: dynamic_cast nit: I guess we prefer to use down_cast<> for this type of casts in the code; see src/kudu/gutil/casts.h for details http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@725 PS3, Line 725: FLAGS_leader_failure_max_missed_heartbeat_periods = 1; : FLAGS_raft_heartbeat_interval_ms = 30; > Yes, ran looped the test in TSAN and didn't see failure caused by this. Is The short heartbeat interval combined with just single missing heartbeat period prompted me to make sure this test eventually converges when run under ASAN/TSAN with --stress_cpu_threads=16 or alike. But since you verified it works as needed, it's great! http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc@1262 PS7, Line 1262: dynamic_cast nit: use down_cast<> here as well? http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453 PS7, Line 1453: TxnStatusManager* txn_status_manager = : dynamic_cast(coordinator); : TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager); Looking at this again and again prompts me to ask the following question: Is it possible to introduce a constructor of ScopedLeaderSharedLock with a parameter of TxnCoordinator* type? The idea is to perform the downcast (using down_cast or dynamic_cast) under the hood, and then report an error via fist_failed_status() in debug mode if the coordinator could not be downcasted to TxnStatusManager? What do you think? -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 7 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 14 Jan 2021 08:34:25 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#7). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 760 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/7 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 7 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 6: (19 comments) http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744 PS6, Line 744: ; > nit: drop extra semicolon Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751 PS6, Line 751: threads.emplace_back([&] { > What does the multithreading add here? Seems like it complicates things sin The point is to ensure concurrent txn reads/writes have no failures with frequent leader elections. Add the comment to be clear. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758 PS6, Line 758: if (s.ok()) { > What failures are we expecting? Add a comment http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748 PS6, Line 748: std::atomic next_thread_id = 0; : vector txn_id_index = { 1, 6, 11, 16, 21}; : for (int t = 0; t < 5; t++) { : threads.emplace_back([&] { : int thread_id = next_thread_id++; : CHECK_LT(thread_id, txn_id_index.size()); : int start_txn_id = txn_id_index[thread_id]; : for (int i = start_txn_id; i < start_txn_id + 5; i++) { : TxnStatusEntryPB txn_status; : Status s = txn_sys_client_->BeginTransaction(i, kUser); : if (s.ok()) { : // Make sure a re-election happens before the following read. : SleepFor(MonoDelta::FromMilliseconds(3 * FLAGS_raft_heartbeat_interval_ms)); : s = txn_sys_client_->GetTransactionStatus(i, kUser, &txn_status); : CHECK_OK(s); : CHECK(txn_status.has_user()); : CHECK_STREQ(kUser, txn_status.user().c_str()); : CHECK(txn_status.has_state()); : CHECK_EQ(TxnStatePB::OPEN, txn_status.state()); : } : } : }); : } > nit: would be easier to read as: Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403 PS6, Line 403: << > nit: maybe add a message here too indicating what's going on Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472 PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down or shutdown. State: $0", :TabletStatePB_Name(state_)); > nit: might be cleaner to let the callers of these otherwise simple function Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@979 PS6, Line 979: if (!txn_coordinator_) { : return false; : } > nit: maybe only call this if txn_coordinator_ is set and change this to a D I updated the name to reflect the functionality which is ShouldRunTxnCoordinatorStalenessTask(). So keep it as it is here. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@997 PS6, Line 997: state_ == STOPPING || state_ == STOPPED) { > nit: I'm curious, why not reuse the != RUNNING condition from above? A comm Added a comment. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@109 PS6, Line 109: but some operations : // may still be legal. > I might be missing something but is this fact used anywhere? What operation Sorry this is not used, removed it. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@106 PS6, Line 106: const Status& replica_status() const { return replica_status_; } : : // Leadership status of the transaction status manager. If not OK, the : // transaction status manager is not the leader, but some operations : // may still be legal. : const Status& leader_status() const { : return leader_status_; : } > nit: Are these ever used? Done http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@118 PS6, Line 118: if (!replica_status_.ok()) { : return replica_status_;
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 ) Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. Patch Set 6: (27 comments) A bunch of nits but overall looking much better! http://gerrit.cloudera.org:8080/#/c/16648/5/src/2.patch File src/2.patch: PS5: Remove this. http://gerrit.cloudera.org:8080/#/c/16648/5/src/3.patch File src/3.patch: PS5: Remove this. http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/integration-tests/txn_status_manager-itest.cc File src/kudu/integration-tests/txn_status_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/integration-tests/txn_status_manager-itest.cc@447 PS5, Line 447: SleepFor(MonoDelta::FromMilliseconds(5 * keepalive_interval_ms)); Do we need this if we're repeating in ASSERT_EVENTUALLY? http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc File src/kudu/integration-tests/txn_status_table-itest.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744 PS6, Line 744: ; nit: drop extra semicolon http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751 PS6, Line 751: threads.emplace_back([&] { What does the multithreading add here? Seems like it complicates things since we'd have to worry about monotonically increasing transaction IDs. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758 PS6, Line 758: if (s.ok()) { What failures are we expecting? http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748 PS6, Line 748: std::atomic next_thread_id = 0; : vector txn_id_index = { 1, 6, 11, 16, 21}; : for (int t = 0; t < 5; t++) { : threads.emplace_back([&] { : int thread_id = next_thread_id++; : CHECK_LT(thread_id, txn_id_index.size()); : int start_txn_id = txn_id_index[thread_id]; : for (int i = start_txn_id; i < start_txn_id + 5; i++) { : TxnStatusEntryPB txn_status; : Status s = txn_sys_client_->BeginTransaction(i, kUser); : if (s.ok()) { : // Make sure a re-election happens before the following read. : SleepFor(MonoDelta::FromMilliseconds(3 * FLAGS_raft_heartbeat_interval_ms)); : s = txn_sys_client_->GetTransactionStatus(i, kUser, &txn_status); : CHECK_OK(s); : CHECK(txn_status.has_user()); : CHECK_STREQ(kUser, txn_status.user().c_str()); : CHECK(txn_status.has_state()); : CHECK_EQ(TxnStatePB::OPEN, txn_status.state()); : } : } : }); : } nit: would be easier to read as: const int kTxnsPerThread = 5; const int kNumThreads = 5; vector threads; for (int t = 0; t < kNumThreads; t++) { threads.emplace_back([t, &] { for (int i = 0; i < kNumTxnsPerThreads; i++) { // do stuff on txn (t * kNumTxnsPerThreads + i) } } } http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@395 PS5, Line 395: "Received notification of TxnStatusTablet state change " : << "but the raft consensus is not initialized. Tablet ID: " : << tablet_id << ". Reason: " << reason; nit: use Substitute. Same below. http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@407 PS5, Line 407: << nit: spacing alignment http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@997 PS5, Line 997: state_ == STOPPING || state_ == STOPPED I'm curious whether the code would work if we reused the != RUNNING condition. If so, we could use the same function at both callsites. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403 PS6, Line 403: << nit: maybe add a message here too indicating what's going on http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472 PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down or shutdown. State: $0", :TabletStatePB_Name(state_)); nit: might be cleaner to let the callers of these otherwise simple functions handle logging. http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/table
[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#6). Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 769 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/6 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)