[kudu-CR] [tool] KUDU-2181 CLI to remove master
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17010 ) Change subject: [tool] KUDU-2181 CLI to remove master .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/17010/8/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/17010/8/src/kudu/tools/tool_action_master.cc@574 PS8, Line 574: unique_ptr remove_master = : ActionBuilder("remove", &RemoveMasterChangeConfig) > This too is because of using an outdated branch locally like add CLI. Fixin Done -- To view, visit http://gerrit.cloudera.org:8080/17010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 Gerrit-Change-Number: 17010 Gerrit-PatchSet: 9 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 18 Feb 2021 06:13:39 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to add master
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16530 ) Change subject: [tool] KUDU-2181 CLI to add master .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/dynamic_multi_master-test.cc@1023 PS9, Line 1023: // Prevent master leadership transfer for a deterministic test. : for (int i = 0 ; i < cluster_->num_masters(); i++) { : // Starting the cluster with following flag leads to a case sometimes : // wherein no leader gets elected leading to failure in ConnectToMaster() RPC. : // So instead set the flag after the cluster is running. : ASSERT_OK(cluster_->SetFlag(cluster_->master(i), "leader_failure_max_missed_heartbeat_periods", : "10.0")); : } > nit: does it make sense to separate this into a function/method and use it This was because of accidental use of older patchset when I switched branches locally. http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/master_service.cc@275 PS9, Line 275: return; > nit: does it make sense to add a LOG(WARNING) or LOG(INFO) about service a Done http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/kudu-tool-test.cc@1110 PS8, Line 1110: > Thanks for the catch. Looks like I resumed from a prior patchset locally. L Done -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 9 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 18 Feb 2021 06:13:34 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to remove master
Hello Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17010 to look at the new patch set (#9). Change subject: [tool] KUDU-2181 CLI to remove master .. [tool] KUDU-2181 CLI to remove master This change adds CLI to remove master from the cluster which in turn invokes the RemoveMaster ChangeConfig RPC. This CLI will be part of the workflow to remove master from an existing cluster or recover dead master at the same HostPort. Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 --- M src/kudu/client/master_proxy_rpc.cc M src/kudu/master/dynamic_multi_master-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_master.cc 5 files changed, 140 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/17010/9 -- To view, visit http://gerrit.cloudera.org:8080/17010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 Gerrit-Change-Number: 17010 Gerrit-PatchSet: 9 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [tool] KUDU-2181 CLI to add master
Hello Mahesh Reddy, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16530 to look at the new patch set (#10). Change subject: [tool] KUDU-2181 CLI to add master .. [tool] KUDU-2181 CLI to add master This change adds a CLI that invokes the AddMaster RPC to perform Raft ChangeConfig. This CLI will be part of the workflow to migrate to multiple masters in a Kudu cluster. Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc --- M src/kudu/client/master_proxy_rpc.cc M src/kudu/master/dynamic_multi_master-test.cc M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc 8 files changed, 241 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/16530/10 -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 10 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [tool] KUDU-2181 CLI to add master
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16530 ) Change subject: [tool] KUDU-2181 CLI to add master .. Patch Set 9: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/dynamic_multi_master-test.cc@1023 PS9, Line 1023: // Prevent master leadership transfer for a deterministic test. : for (int i = 0 ; i < cluster_->num_masters(); i++) { : // Starting the cluster with following flag leads to a case sometimes : // wherein no leader gets elected leading to failure in ConnectToMaster() RPC. : // So instead set the flag after the cluster is running. : ASSERT_OK(cluster_->SetFlag(cluster_->master(i), "leader_failure_max_missed_heartbeat_periods", : "10.0")); : } nit: does it make sense to separate this into a function/method and use it here and below? http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/master/master_service.cc@275 PS9, Line 275: return; nit: does it make sense to add a LOG(WARNING) or LOG(INFO) about service a "duplicate" request? http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/16530/9/src/kudu/tools/tool_action_master.cc@145 PS9, Line 145: !s.ok() && nit: maybe, drop this part? -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 9 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 18 Feb 2021 00:10:51 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to remove master
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17010 ) Change subject: [tool] KUDU-2181 CLI to remove master .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/17010/8/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/17010/8/src/kudu/tools/tool_action_master.cc@574 PS8, Line 574: unique_ptr remove_master = : ActionBuilder("remove", &RemoveMasterChangeConfig) > I suppose it also makes sense to add coverage in kudu-tool-test for this as This too is because of using an outdated branch locally like add CLI. Fixing it. -- To view, visit http://gerrit.cloudera.org:8080/17010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 Gerrit-Change-Number: 17010 Gerrit-PatchSet: 8 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 18 Feb 2021 00:10:34 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to add master
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16530 ) Change subject: [tool] KUDU-2181 CLI to add master .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/kudu-tool-test.cc@1110 PS8, Line 1110: > nit: accidental revert? Thanks for the catch. Looks like I resumed from a prior patchset locally. Let me fix that. -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 9 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 17 Feb 2021 23:54:30 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to remove master
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17010 ) Change subject: [tool] KUDU-2181 CLI to remove master .. Patch Set 8: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/17010/7/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/17010/7/src/kudu/master/dynamic_multi_master-test.cc@874 PS7, Line 874: pda > if the directory is deleted then how will the restarted master become part of > Raft config. I suppose we'd also need to run through the "add master" workflow to do this, which is basically the "recover a master" test case that I suggested at some point. I'm fine merging this as is, provided you're planning on picking that up soon. http://gerrit.cloudera.org:8080/#/c/17010/8/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/17010/8/src/kudu/tools/tool_action_master.cc@574 PS8, Line 574: unique_ptr remove_master = : ActionBuilder("remove", &RemoveMasterChangeConfig) I suppose it also makes sense to add coverage in kudu-tool-test for this as well? -- To view, visit http://gerrit.cloudera.org:8080/17010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 Gerrit-Change-Number: 17010 Gerrit-PatchSet: 8 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 17 Feb 2021 23:49:39 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to add master
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16530 ) Change subject: [tool] KUDU-2181 CLI to add master .. Patch Set 9: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/kudu-tool-test.cc@1110 PS8, Line 1110: nit: accidental revert? -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 9 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 17 Feb 2021 23:45:50 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to remove master
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17010 ) Change subject: [tool] KUDU-2181 CLI to remove master .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/17010/7/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/17010/7/src/kudu/master/dynamic_multi_master-test.cc@874 PS7, Line 874: pda > It's not required because there is no ambiguity with other master. I'll add > it back for a positive case where uuid isn't strictly necessary. Done this part. > One way that comes to mind is to shutdown the master, delete the contents of > the master's directories, and restarting the same master. IIRC, that way the > master keeps its hostport, but assigns itself a new UUID. This is an interesting idea, so the deleted master still remains part of the Raft config and would be declared unavailable/unhealthy and a new master with same HostPort and different uuid shows up. I haven't tried it but I have some doubts, if the directory is deleted then how will the restarted master become part of Raft config. I'm going to table this particular test case of multiple masters with same HostPort for now. Filed KUDU-3246. -- To view, visit http://gerrit.cloudera.org:8080/17010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 Gerrit-Change-Number: 17010 Gerrit-PatchSet: 8 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 17 Feb 2021 23:27:21 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-2181 CLI to remove master
Hello Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17010 to look at the new patch set (#8). Change subject: [tool] KUDU-2181 CLI to remove master .. [tool] KUDU-2181 CLI to remove master This change adds CLI to remove master from the cluster which in turn invokes the RemoveMaster ChangeConfig RPC. This CLI will be part of the workflow to remove master from an existing cluster or recover dead master at the same HostPort. Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 --- M src/kudu/client/master_proxy_rpc.cc M src/kudu/master/dynamic_multi_master-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_master.cc 4 files changed, 141 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/17010/8 -- To view, visit http://gerrit.cloudera.org:8080/17010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5c97b03475b0ffc7b387d7dfc17acc4b13858fb7 Gerrit-Change-Number: 17010 Gerrit-PatchSet: 8 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [tool] KUDU-2181 CLI to add master
Hello Mahesh Reddy, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16530 to look at the new patch set (#9). Change subject: [tool] KUDU-2181 CLI to add master .. [tool] KUDU-2181 CLI to add master This change adds a CLI that invokes the AddMaster RPC to perform Raft ChangeConfig. This CLI will be part of the workflow to migrate to multiple masters in a Kudu cluster. Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc --- M src/kudu/client/master_proxy_rpc.cc M src/kudu/master/dynamic_multi_master-test.cc M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc 7 files changed, 227 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/16530/9 -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 9 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [tool] KUDU-2181 CLI to add master
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16530 ) Change subject: [tool] KUDU-2181 CLI to add master .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/master/dynamic_multi_master-test.cc@678 PS8, Line 678: return an error. > nit: needs an update, same below Done http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/16530/8/src/kudu/tools/tool_action_master.cc@146 PS8, Line 146: s.IsRemoteError() && s.ToString().find("Master already present") != string::npos; > +1 Done -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 8 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 17 Feb 2021 23:01:32 + Gerrit-HasComments: Yes
[kudu-CR] [master] fix race between shutdown and TxnManager init
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17076 ) Change subject: [master] fix race between shutdown and TxnManager init .. [master] fix race between shutdown and TxnManager init I saw the following TSAN report while running one of the newer tests introduced recently (TxnOpDispatcherITest.LifecycleBasic). As I can see, the race isn't related to the TxnOpDispatcher itself, but rather to the way how Master::state_ field is used in the master's code. WARNING: ThreadSanitizer: data race (pid=8313) Read of size 4 at 0x7b7806b0 by thread T165: #0 kudu::master::Master::InitTxnManagerTask() src/kudu/master/master.cc:312:9 (libmaster.so+0x26f28b) #1 kudu::master::Master::ScheduleTxnManagerInit()::$_1::operator()() const src/kudu/master/master.cc:299:46 (libmaster.so+0x2741a1) Previous write of size 4 at 0x7b7806b0 by main thread: #0 kudu::master::Master::ShutdownImpl() src/kudu/master/master.cc:406:12 (libmaster.so+0x26da08) #1 kudu::master::Master::Shutdown() src/kudu/master/master.h:74:5 (libmaster.so+0x275d49) #2 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:117:14 (libmaster.so+0x2a9c38) #3 kudu::cluster::InternalMiniCluster::ShutdownNodes(kudu::cluster::ClusterNodes) src/kudu/mini-cluster/internal_mini_cluster.cc:230:22 (libmini_cluster.so+0x84037) #4 kudu::cluster::MiniCluster::Shutdown() src/kudu/mini-cluster/mini_cluster.h:79:5 (libitest_util.so+0xa59d4) #5 kudu::cluster::InternalMiniCluster::~InternalMiniCluster() src/kudu/mini-cluster/internal_mini_cluster.cc:94:3 (libmini_cluster.so+0x82a03) This patch fixes the issue by making Master::state_ atomic. Change-Id: Ifa7541aa7dc7dbdb8e6af5c1f40edf23b850fc92 Reviewed-on: http://gerrit.cloudera.org:8080/17076 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong --- M src/kudu/master/master.h 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Kudu Jenkins: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifa7541aa7dc7dbdb8e6af5c1f40edf23b850fc92 Gerrit-Change-Number: 17076 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [master] fix race between shutdown and TxnManager init
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17076 ) Change subject: [master] fix race between shutdown and TxnManager init .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa7541aa7dc7dbdb8e6af5c1f40edf23b850fc92 Gerrit-Change-Number: 17076 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 17 Feb 2021 21:13:16 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17037 ) Change subject: KUDU-2612 tablet servers automatically register txn participants .. Patch Set 6: (12 comments) Still need to digest the life cycle of the dispatcher and its callbacks, but leaving some questions so far. http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/client/client-test.cc@7694 PS6, Line 7694: } Maybe count the rows to verify the state? http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@155 PS6, Line 155: txn_participant_registrant nit: doc this? http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@405 PS6, Line 405: // nit: spurious line http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@456 PS6, Line 456: ready_, ops_queue_. nit: if so, would be nice to at least mention how users should reason about concurrency control for the other members http://gerrit.cloudera.org:8080/#/c/17037/5/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/17037/5/src/kudu/tablet/tablet_replica.cc@1262 PS5, Line 1262: const bool has_request_id = op->has_request_id(); > Yep, that's done for all pending write operations in the queue. Here the q If we've failed, and updated Init() such that failure calls the client callback, we would have already responded to the client and wouldn't need to add it to the queue. In any case, I think your new approach is good and less intrusive. I was at first worried that the response could be garbage after the std::move() but seems it's only freed after the callback is called. http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@552 PS6, Line 552: dispatcher = FindOrNull(txn_op_dispatchers_, txn_id); : if (!dispatcher) { : dispatcher = &EmplaceOrDie( : &txn_op_dispatchers_, : std::piecewise_construct, : std::forward_as_tuple(txn_id), : std::forward_as_tuple(this, FLAGS_tablet_max_pending_txn_write_ops)); : } Does LookupOrEmplace work here? http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@567 PS6, Line 567: SubmitTxnWriteCb nit: seems we typically name callbacks based on what we're calling back from, rather than what the callback does. Maybe call this RegisteredParticipantCb() or somesuch? http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@590 PS6, Line 590: txn_op_dispatchers_.erase(it); Do we have any guarantees that we've submitted all the pending writes? Or at least, called all the callbacks? http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@642 PS6, Line 642: txn_registrator nit: dispatcher? Same elsewhere http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@1227 PS6, Line 1227: transaction is not open" Don't we also unregister if we cancel due to not being the leader? Eg in SubmitTxnParticipantBeginOp() we may fail to submit the participant op? http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@1258 PS6, Line 1258: // Store the information necessary to recreate WriteOp instance: this is : // useful if TabletReplica::SubmitWrite() returns non-OK status. nit: it'd be nice if you could mention the lifecycle of the request and response pointers. At first glance, it seems like these values could all be bogus upon the std::move() call, but I don't think that's the case. http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@1304 PS6, Line 1304: Cleanup What's the point of returning a status here? Doesn't this always return 'status'? -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 17 Feb 2021 21:12:01 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17037 ) Change subject: KUDU-2612 tablet servers automatically register txn participants .. Patch Set 6: > unrelated build/test failures: > * DEBUG: strange issue with freeing gflags memory -- SIGSEGV in > libtcmalloc > * RELEASE: build failure due to python dependency issues (now fixed > with f24ec6a7757af421f6da3e5f4b93f3a6761d23bb) > * TSAN: a race between ScheduleTxnManagerInit() and > Master::ShutdownImpl() The latter race is addressed with http://gerrit.cloudera.org:8080/17076 -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 17 Feb 2021 19:35:27 + Gerrit-HasComments: No
[kudu-CR] [master] fix race between shutdown and TxnManager init
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17076 Change subject: [master] fix race between shutdown and TxnManager init .. [master] fix race between shutdown and TxnManager init I saw the following TSAN report while running one of the newer tests introduced recently (TxnOpDispatcherITest.LifecycleBasic). As I can see, the race isn't related to the TxnOpDispatcher itself, but rather to the way how Master::state_ field is used in the master's code. WARNING: ThreadSanitizer: data race (pid=8313) Read of size 4 at 0x7b7806b0 by thread T165: #0 kudu::master::Master::InitTxnManagerTask() src/kudu/master/master.cc:312:9 (libmaster.so+0x26f28b) #1 kudu::master::Master::ScheduleTxnManagerInit()::$_1::operator()() const src/kudu/master/master.cc:299:46 (libmaster.so+0x2741a1) Previous write of size 4 at 0x7b7806b0 by main thread: #0 kudu::master::Master::ShutdownImpl() src/kudu/master/master.cc:406:12 (libmaster.so+0x26da08) #1 kudu::master::Master::Shutdown() src/kudu/master/master.h:74:5 (libmaster.so+0x275d49) #2 kudu::master::MiniMaster::Shutdown() src/kudu/master/mini_master.cc:117:14 (libmaster.so+0x2a9c38) #3 kudu::cluster::InternalMiniCluster::ShutdownNodes(kudu::cluster::ClusterNodes) src/kudu/mini-cluster/internal_mini_cluster.cc:230:22 (libmini_cluster.so+0x84037) #4 kudu::cluster::MiniCluster::Shutdown() src/kudu/mini-cluster/mini_cluster.h:79:5 (libitest_util.so+0xa59d4) #5 kudu::cluster::InternalMiniCluster::~InternalMiniCluster() src/kudu/mini-cluster/internal_mini_cluster.cc:94:3 (libmini_cluster.so+0x82a03) This patch fixes the issue by making Master::state_ atomic. Change-Id: Ifa7541aa7dc7dbdb8e6af5c1f40edf23b850fc92 --- M src/kudu/master/master.h 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/17076/1 -- To view, visit http://gerrit.cloudera.org:8080/17076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifa7541aa7dc7dbdb8e6af5c1f40edf23b850fc92 Gerrit-Change-Number: 17076 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17037 ) Change subject: KUDU-2612 tablet servers automatically register txn participants .. Patch Set 6: Verified+1 unrelated build/test failures: * DEBUG: strange issue with freeing gflags memory -- SIGSEGV in libtcmalloc * RELEASE: build failure due to python dependency issues (now fixed with f24ec6a7757af421f6da3e5f4b93f3a6761d23bb) * TSAN: a race between ScheduleTxnManagerInit() and Master::ShutdownImpl() -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 17 Feb 2021 18:47:39 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Alexey Serbin has removed a vote on this change. Change subject: KUDU-2612 tablet servers automatically register txn participants .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)