[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16648 to look at the new patch set (#5). Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only .. (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only Currently, even though a non-leader TxnStatusManager will be unable to write to the underlying table (in the Raft subsystem the write would be aborted), we may want to restrict calls to be made by the leader TxnStatusManagers only. The motivation is to provide a more robust system, which avoids cases when the request was sent to a laggy follower, we may end up failing the request with an error. This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog Manager) to be used to ensure the requests were sent to leaders only and to block all other operations while reloading the persistent transaction status metadata upon leadership changes. Note that during failover the leader replica will wait until in-flight ops in the previous consensus term to be applied before reloading the metadata. Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a --- A src/2.patch A src/3.patch M src/kudu/client/client-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica-test-base.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_status_tablet.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 20 files changed, 2,011 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/5 -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 5 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [java] Add JAAS example to client API doc
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16940 ) Change subject: [java] Add JAAS example to client API doc .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@201 PS1, Line 201: conntect > nit: connect I like the idea of showing the construction of the kudu client like below. http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@211 PS1, Line 211: my-ap > nit: not related to this patch, but how about naming this 'app-user' or som +1 -- To view, visit http://gerrit.cloudera.org:8080/16940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b Gerrit-Change-Number: 16940 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 12 Jan 2021 04:01:47 + Gerrit-HasComments: Yes
[kudu-CR] [master] KUDU-2181 Raft ChangeConfig request to remove a master
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16936 ) Change subject: [master] KUDU-2181 Raft ChangeConfig request to remove a master .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/16936/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16936/1//COMMIT_MSG@12 PS1, Line 12: tests I'm curious what's the behavior of Kudu client when a master is removed underneath after the client has connected to the cluster. Does it make sense to add such tests for both C++ and Java clients? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@254 PS1, Line 254: cluster nit: it would be great to document the 'cluster' parameter as well in the comment above http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@938 PS1, Line 938: non_leader_master_idx Does it matter whether it's leader or non-leader master index? I guess if the functionality is controlled by feature flags, this should matter since the feature flags verification happens before any application-specific code is invoked. As such, I guess here we should check both masters and see the same error, right? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@995 PS1, Line 995: It's possible there is a leadership change between querying for leader master and : // sending the add master request to non-leader master and hence using ASSERT_EVENTUALLY. > Is this not a concern in other tests? Would setting --leader_failure_max_mi In tablet server tests there is another way to protect against this -- disable master elections and then issue an election request manually. This might be an option here as well, I guess? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@1009 PS1, Line 1009: ASSERT_OK(cluster_->master_proxy(non_leader_master_idx)->RemoveMaster(req, , )); > What if this succeeds because we happened to elect non_leader_master_idx as Yep, I'm curious whether this might end up in removing one master, and then trying to remove the only one which is left? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@1019 PS1, Line 1019: TEST_F(DynamicMultiMasterTest, TestRemoveLeaderMaster) { Does it make sense to add a simple scenario to verify that a single and the only master cannot be removed from a cluster? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.h File src/kudu/master/master.h: http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.h@131 PS1, Line 131: of nit: of initiating ? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc File src/kudu/master/master.cc: http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc@523 PS1, Line 523: if (HostPortFromPB(peer.last_known_addr()) == hp Could there be more than one master satisfying this criterion? If so, what should be the behavior -- just remove the first matching one? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc@532 PS1, Line 532: leader master nit: I guess it might be not only the leader, right? Maybe, just report that a master cannot remove itself from the config? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.proto@941 PS1, Line 941: HostPortPB Why HostPostPB instead of UUID as an identifier of existing server is better in this context? -- To view, visit http://gerrit.cloudera.org:8080/16936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76c03b8850faef60b65f85184c0a4db7cc6759ee Gerrit-Change-Number: 16936 Gerrit-PatchSet: 1 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 12 Jan 2021 03:50:02 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16941 to look at the new patch set (#2). Change subject: KUDU-3205: Fix building scan tokens when tablet not found errors occur .. KUDU-3205: Fix building scan tokens when tablet not found errors occur When tablet not found errors occur AsyncKuduClient.invalidateTabletCache is called which then calls RemoteTablet.removeTabletClient to remove a server (by uuid) from the RemoteTablet tabletServers list. This means that the RemoteTablet can have a replica returned from RemoteTablet.getReplicas which has a server that is not returned by RemoteTablet.getTabletServersCopy. This results in a NPE when building a scan token because there is no matching server for the replica in the serverIndexMap. To fix this I simply check if the serverIndex found via the serverIndexMap is null and ignore that replica if it is. A better long term fix is likely to build the server info from the replicas themself, or to change the RemoteTablet behavior to blacklist servers instead of removing them from the actual tabletServers list, or to also remove the replica itself when RemoteTablet.removeTabletClient is called. I didn’t do any of these options now because they are all larger changes with a wider potential impact. Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java 4 files changed, 55 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/16941/2 -- To view, visit http://gerrit.cloudera.org:8080/16941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c Gerrit-Change-Number: 16941 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [util] add a few new metrics in MaintenanceManager
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16938 ) Change subject: [util] add a few new metrics in MaintenanceManager .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/16938/2/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: PS2: Would be great to get some basic test coverage for our new metrics here. http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc@395 PS1, Line 395: WAL > Yup, I'm going to do so. I'm also adding a few related metrics, like a his Nice! http://gerrit.cloudera.org:8080/#/c/16938/2/src/kudu/util/maintenance_manager_metrics.cc File src/kudu/util/maintenance_manager_metrics.cc: http://gerrit.cloudera.org:8080/#/c/16938/2/src/kudu/util/maintenance_manager_metrics.cc@26 PS2, Line 26: "Number of times when calling Prepare() on a maintenance " : "operation failed", nit: it might be worth mentioning why Prepare() might fail, e.g. if an op collides with another running instance of that op. -- To view, visit http://gerrit.cloudera.org:8080/16938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486 Gerrit-Change-Number: 16938 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 12 Jan 2021 01:55:58 + Gerrit-HasComments: Yes
[kudu-CR] [java] Add JAAS example to client API doc
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16940 ) Change subject: [java] Add JAAS example to client API doc .. Patch Set 1: (3 comments) Thanks for putting this change together! http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@201 PS1, Line 201: conntect nit: connect Alternatively, "Create the KuduClient" or somesuch? Or maybe copy the actual builder code from below? Might not be clear what "connect" means, since there's no public "connect" call. http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@211 PS1, Line 211: my-ap nit: not related to this patch, but how about naming this 'app-user' or something so it's clear it's a user. Should be obvious from the UGI docs, but still probably worth clearer naming. Better yet, it'd be great if both of these examples used consistent user/principals. E.g. point the JAAS example to /path/to/app.keytab and use feasible example users/principals. http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@220 PS1, Line 220: will also automatically : * start a thread to periodically re-login from the keytab. Is anything equivalent done for JAAS? Do JAAS users need to worry about doing this themselves? -- To view, visit http://gerrit.cloudera.org:8080/16940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b Gerrit-Change-Number: 16940 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 12 Jan 2021 01:49:07 + Gerrit-HasComments: Yes
[kudu-CR] [master] KUDU-2181 Raft ChangeConfig request to remove a master
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16936 ) Change subject: [master] KUDU-2181 Raft ChangeConfig request to remove a master .. Patch Set 1: (16 comments) http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@346 PS1, Line 346: if (master != HostPort()) { : *req.mutable_rpc_addr() = HostPortToPB(master); : } What does it mean for this to be called with an unset 'master' field? Would be good to add a comment. Ah, made my way down to the test that uses this. Would it make sense to instead provide a garbage HostPort value from HostPort::ParseString? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@474 PS1, Line 474: TransferMasterLeadershipAndVerify nit: "verify" is a bit overloaded, especially when in a file that can misconstrue it with using the ClusterVerifier. How about calling this TransferMasterLeadership and calling TransferMasterLeadership TransferMasterLeadershipAsync? Or better yet, have them be the same method, but with an optional bool parameter that tells the method to wait for success. http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@787 PS1, Line 787: // When an ExternalMiniCluster is restarted after removal of a master then one of the : // remaining masters can get reassigned to the same wal dir which was previously assigned : // to the removed master. This causes problems during verification, so we always try to : // remove the last master in terms of index for test purposes. Does this problem go away if, rather than creating a new mini cluster, we just restart the old one after the call to RemoveMaster()? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@805 PS1, Line 805: // We need at least 1 operation for the leader in current term before initiating ChangeConfig. I thought this happens by replicating a no-op when first becoming leader. Or is this referring to something else? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@814 PS1, Line 814:ASSERT_EVENTUALLY([&] { : VerifyDeadMasterInSpecifiedState(master_to_remove_uuid, consensus::HealthReportPB::UNKNOWN); : }); It seems possible we'd trigger a test failure if, by luck, the gaps in ASSERT_EVENTUALLY missed this state entirely. It'd be more robust to just assert we get to FAILED, given it's a terminal state. http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@834 PS1, Line 834: ASSERT_FALSE(ContainsKey(actual_master_hps, master_to_remove)); nit: while I appreciate the dedicated checking, especially when it comes to longer tests like these, we can probably do without assertions that are easily verified by reading a handful of lines of nearby, relatively trivial code. In this case, this is redundant with the L833 check since we've removed `master_to_remove` from `expected_master_hps`. http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@853 PS1, Line 853: cluster_.reset(); Does cluster_->Shutdown(); ASSERT_OK(cluster_->Restart()); not work? I think it'd be much less surprising to further users of the EMC who may want to dynamically update the masters. http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@930 PS1, Line 930: /* Omitting "--master_support_change_config" */ nit: maybe just set it to false now so we don't have to update this test when we enable it by default? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@995 PS1, Line 995: It's possible there is a leadership change between querying for leader master and : // sending the add master request to non-leader master and hence using ASSERT_EVENTUALLY. Is this not a concern in other tests? Would setting --leader_failure_max_missed_heartbeat_periods to something very high help avoid this case? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@1009 PS1, Line 1009: ASSERT_OK(cluster_->master_proxy(non_leader_master_idx)->RemoveMaster(req, , )); What if this succeeds because we happened to elect non_leader_master_idx as leader? Wouldn't the call to VerifyNumMastersAndGetAddresses never succeed? Or wouldn't we be unable to successfully remove another Master? http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc File src/kudu/master/master.cc: http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc@532 PS1, Line 532:
[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16941 ) Change subject: KUDU-3205: Fix building scan tokens when tablet not found errors occur .. Patch Set 1: Code-Review+1 (5 comments) http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java: http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@685 PS1, Line 685: if (serverIndex != null) { nit: for readability, consider if (serverIndex == null) { continue; } // business as usual http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java File java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java: http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java@263 PS1, Line 263:* Get replicas of this tablet. The returned list may not be mutated. > nit: do you think it's worth adding a note here expressing that the replica +1 http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java: http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@732 PS1, Line 732: KuduScanToken.KuduScanTokenBuilder tokenBuilder nit: maybe, add includeTabletMetadata(true) and includeTableMetadata(true) to be explicit what settings are important here, or maybe just add an ASSERT to make sure the builder or the generated token has those properties set. http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@733 PS1, Line 733: tokens Probably, that's not exactly related to this change, bust just in case: what happens in case of ReplicaSelection.CLOSEST_REPLICA mode and when the information on one of the replicas is missing in the token? Should client try to re-resolve the location of replicas in that case because the new replica might be placed on the same node where the client runs? http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@739 PS1, Line 739: tokens.get(0).intoScanner(newClient); Does it matter what's the role of the replica removed? Does it work as expected when a leader replica's metadata has been removed from the token, and the ReplicaSelection is set to ReplicaSelection.LEADER_ONLY? -- To view, visit http://gerrit.cloudera.org:8080/16941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c Gerrit-Change-Number: 16941 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 12 Jan 2021 01:31:50 + Gerrit-HasComments: Yes
[kudu-CR] [util] add a few new metrics in MaintenanceManager
Hello Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16938 to look at the new patch set (#2). Change subject: [util] add a few new metrics in MaintenanceManager .. [util] add a few new metrics in MaintenanceManager This patch adds a couple of metrics for MaintenanceManager to track the duration of choosing the best candidate among available maintenance operations and number of times the Prepare() method for a maintenance operation failed: * maintenance_op_find_best_candidate_duration * maintenance_op_prepare_failed In addition, it adds SCOPED_LOG_SLOW_EXECUTION with the threshold of 10 seconds into the MaintenanceManager::FindBestOp() method. At this point, I manually verified that those metrics are present and show relevant information. I'm planning to add an automated test to cover the behavior of these new metrics in [1] to have less conflicts with the mentioned patch. The motivation for this change is a finding that FindBestOp()'s computational complexity is O(n^2) of the number of replicas per tablet server (each tablet replica registers about 8 maintenance operations). Also, BudgetedCompactionPolicy::RunApproximation()'s computational complexity is O(n^2) of the number of rowset in max and min keys. In the wild, there was an instance of a Kudu cluster with high data ingest ratio with the following stack showing in every snapshot in the diagnostic logs for many hours in a row: 0xa11735 kudu::tablet::BudgetedCompactionPolicy::RunApproximation() 0xa129c9 kudu::tablet::BudgetedCompactionPolicy::PickRowSets() 0x9c8d80 kudu::tablet::Tablet::UpdateCompactionStats() 0x9ec848 kudu::tablet::CompactRowSetsOp::UpdateStats() 0x1b3de5c kudu::MaintenanceManager::FindBestOp() 0x1b3f3c5 kudu::MaintenanceManager::RunSchedulerThread() 0x1b86014 kudu::Thread::SuperviseThread() [1] https://gerrit.cloudera.org/#/c/16937/ Change-Id: If5420afd605f9bd22207af142b49e73336907486 --- M src/kudu/master/master.cc M src/kudu/tserver/tablet_server.cc M src/kudu/util/CMakeLists.txt M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h A src/kudu/util/maintenance_manager_metrics.cc A src/kudu/util/maintenance_manager_metrics.h 8 files changed, 138 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/38/16938/2 -- To view, visit http://gerrit.cloudera.org:8080/16938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486 Gerrit-Change-Number: 16938 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16941 ) Change subject: KUDU-3205: Fix building scan tokens when tablet not found errors occur .. Patch Set 1: Code-Review+1 (1 comment) LGTM, lint error aside. http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java File java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java: http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java@263 PS1, Line 263:* Get replicas of this tablet. The returned list may not be mutated. nit: do you think it's worth adding a note here expressing that the replicas may point at tablet servers not included in getTabletServersCopy(). It's added at the callsites, but it might be worth calling out closer to the method definitions too. -- To view, visit http://gerrit.cloudera.org:8080/16941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c Gerrit-Change-Number: 16941 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 23:56:47 + Gerrit-HasComments: Yes
[kudu-CR] [build] Fix the java compatibility checker script
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13004 ) Change subject: [build] Fix the java compatibility checker script .. [build] Fix the java compatibility checker script This patch fixes check_compatibility.py. Primarily it changed the existing script to use the Gradle build in place of the Maven build that no longer exists. It now uses japicmp in place of java_acc because java_acc had issues unzipping the jars created by the kudu build. Additionally japicmp’s filtering seemed to work better. The patch also updates the Gradle build to ensure generated Protobuf classes are annotated with @Generated to make report filtering easier. In follow on patches I may incorporate this into the LINT job and cause failure on API breakage. Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3 Reviewed-on: http://gerrit.cloudera.org:8080/13004 Tested-by: Kudu Jenkins Reviewed-by: Greg Solovyev Reviewed-by: Alexey Serbin --- M build-support/check_compatibility.py M java/gradle/protobuf.gradle 2 files changed, 111 insertions(+), 91 deletions(-) Approvals: Kudu Jenkins: Verified Greg Solovyev: Looks good to me, but someone else must approve Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3 Gerrit-Change-Number: 13004 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur
Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16941 Change subject: KUDU-3205: Fix building scan tokens when tablet not found errors occur .. KUDU-3205: Fix building scan tokens when tablet not found errors occur When tablet not found errors occur AsyncKuduClient.invalidateTabletCache is called which then calls RemoteTablet.removeTabletClient to remove a server (by uuid) from the RemoteTablet tabletServers list. This means that the RemoteTablet can have a replica returned from RemoteTablet.getReplicas which has a server that is not returned by RemoteTablet.getTabletServersCopy. This results in a NPE when building a scan token because there is no matching server for the replica in the serverIndexMap. To fix this I simply check if the serverIndex found via the serverIndexMap is null and ignore that replica if it is. A better long term fix is likely to build the server info from the replicas themself, or to change the RemoteTablet behavior to blacklist servers instead of removing them from the actual tabletServers list, or to also remove the replica itself when RemoteTablet.removeTabletClient is called. I didn’t do any of these options now because they are all larger changes with a wider potential impact. Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java 4 files changed, 62 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/16941/1 -- To view, visit http://gerrit.cloudera.org:8080/16941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c Gerrit-Change-Number: 16941 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke
[kudu-CR] [build] Fix the java compatibility checker script
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13004 ) Change subject: [build] Fix the java compatibility checker script .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3 Gerrit-Change-Number: 13004 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 21:54:00 + Gerrit-HasComments: No
[kudu-CR] [java] Add JAAS example to client API doc
Hello Alexey Serbin, Andrew Wong, Grant Henke, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/16940 to review the following change. Change subject: [java] Add JAAS example to client API doc .. [java] Add JAAS example to client API doc The AsyncKuduClient class documents how to connect to a secure Kudu cluster using ticket cache or using a keytab when the application is using hadoop-common libraries. We can't always rely on UserGroupInformation though, so this commit expands the keytab-based login case by a JAAS config example. It also fixes a formatting issue with the "@Override" in the code, as it wasn't properly rendered. Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java 1 file changed, 34 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/16940/1 -- To view, visit http://gerrit.cloudera.org:8080/16940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b Gerrit-Change-Number: 16940 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke
[kudu-CR] [build] Fix the java compatibility checker script
Greg Solovyev has posted comments on this change. ( http://gerrit.cloudera.org:8080/13004 ) Change subject: [build] Fix the java compatibility checker script .. Patch Set 4: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py File build-support/check_compatibility.py: http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@121 PS3, Line 121: def find_client_jars(path): > I exclude jars because they are know to be non-public or application jars w Done http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@175 PS3, Line 175: if src_name == dst_name: > Currently the script uses super rudimentary arguments. I will enhance the a Done -- To view, visit http://gerrit.cloudera.org:8080/13004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3 Gerrit-Change-Number: 13004 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 18:28:37 + Gerrit-HasComments: Yes
[kudu-CR] [build] Fix the java compatibility checker script
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13004 ) Change subject: [build] Fix the java compatibility checker script .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/13004/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13004/3//COMMIT_MSG@22 PS3, Line 22: incorparate > incorporate ? Done http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py File build-support/check_compatibility.py: http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@109 PS3, Line 109: if os.path.exists(get_japicmp_path()): : logging.info("japicmp is already downloaded.") > Will it work as expected when a new version of the tool is available? Mayb Done http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@115 PS3, Line 115: "curl", "-o" > nit: does it make sense to add few options to match the behavior of downloa Done http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@121 PS3, Line 121: return [j for j in all_jars if ( > Nit: if I am reading this correctly, this code is excluding 9 patterns in o I exclude jars because they are know to be non-public or application jars which do not have public API. I prefer a exclusion to explicit inclusion so that by default we scan and check APIs if a new jar is added. http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@175 PS3, Line 175: # TODO(ghenke): Add support for --error-on-binary-incompatibility and --error-on-source-incompatibility. > Currently, the script produces a lot of output for modified and added metho Currently the script uses super rudimentary arguments. I will enhance the arguments in a follow on patch once I understand what types of options a scheduled job might need. -- To view, visit http://gerrit.cloudera.org:8080/13004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3 Gerrit-Change-Number: 13004 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 17:13:04 + Gerrit-HasComments: Yes
[kudu-CR] [build] Fix the java compatibility checker script
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Greg Solovyev, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13004 to look at the new patch set (#4). Change subject: [build] Fix the java compatibility checker script .. [build] Fix the java compatibility checker script This patch fixes check_compatibility.py. Primarily it changed the existing script to use the Gradle build in place of the Maven build that no longer exists. It now uses japicmp in place of java_acc because java_acc had issues unzipping the jars created by the kudu build. Additionally japicmp’s filtering seemed to work better. The patch also updates the Gradle build to ensure generated Protobuf classes are annotated with @Generated to make report filtering easier. In follow on patches I may incorporate this into the LINT job and cause failure on API breakage. Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3 --- M build-support/check_compatibility.py M java/gradle/protobuf.gradle 2 files changed, 111 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/13004/4 -- To view, visit http://gerrit.cloudera.org:8080/13004 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3 Gerrit-Change-Number: 13004 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [docker] Update the supported base images
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16877 ) Change subject: [docker] Update the supported base images .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py File docker/docker-build.py: http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py@67 PS3, Line 67: bionic > I'm not sure that's true -- in the prior version of this file, the DEFAULT_ ah, true. I suppose it may have been more arbitrary. I am happy to update it to be focal if you think that's a better choice. -- To view, visit http://gerrit.cloudera.org:8080/16877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5 Gerrit-Change-Number: 16877 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 16:30:49 + Gerrit-HasComments: Yes
[kudu-CR] [util] logging on slow execution in FindBestOp()
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16938 ) Change subject: [util] logging on slow execution in FindBestOp() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc@395 PS1, Line 395: 3000 > If you haven't already it might be a good idea to try loading up a single t Yup, I'm going to do so. I'm also adding a few related metrics, like a histogram of how much time of FindBestOp() is taking. -- To view, visit http://gerrit.cloudera.org:8080/16938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486 Gerrit-Change-Number: 16938 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 16:13:34 + Gerrit-HasComments: Yes
[kudu-CR] [docker] Update the supported base images
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16877 ) Change subject: [docker] Update the supported base images .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py File docker/docker-build.py: http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py@67 PS3, Line 67: bionic > So far we have been using the oldest available release as the default. I ha I'm not sure that's true -- in the prior version of this file, the DEFAULT_OS was set to 'ubuntu:xenial', but the list of available releases for Ubuntu was 'ubuntu:trusty', 'ubuntu:xenial', 'ubuntu:bionic' and the oldest one among those was 'ubuntu:trusty, not 'ubuntu:xenial'. -- To view, visit http://gerrit.cloudera.org:8080/16877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5 Gerrit-Change-Number: 16877 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 16:11:55 + Gerrit-HasComments: Yes
[kudu-CR] [util] fix another lock contention in MaintenanceManager
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16934 ) Change subject: [util] fix another lock contention in MaintenanceManager .. [util] fix another lock contention in MaintenanceManager I had a chance to look at stack traces in tservers' diagnostic log files at a Kudu cluster with high data ingest ratio. There were many stack snapshots captured periodically every 30 seconds, but the pattern below presented in every snapshot in a row for several hours. tids=[4016] 0x7f64f36b05e0 0xa116c6 kudu::tablet::BudgetedCompactionPolicy::RunApproximation() 0xa129c9 kudu::tablet::BudgetedCompactionPolicy::PickRowSets() 0x9c8d80 kudu::tablet::Tablet::UpdateCompactionStats() 0x9ec848 kudu::tablet::CompactRowSetsOp::UpdateStats() 0x1b3de5c kudu::MaintenanceManager::FindBestOp() 0x1b3f3c5 kudu::MaintenanceManager::RunSchedulerThread() 0x1b86014 kudu::Thread::SuperviseThread() 0x7f64f36a8e25 start_thread 0x7f64f176f34d __clone tids=[48325,48324,48323] 0x7f64f36b05e0 0x7f64f36af42b __lll_lock_wait 0x7f64f36aadcb _L_lock_812 0x7f64f36aac98 __GI___pthread_mutex_lock 0x1b546fd kudu::Mutex::Acquire() 0x1b42913 kudu::MaintenanceManager::LaunchOp() 0x1b929cd kudu::FunctionRunnable::Run() 0x1b8fa87 kudu::ThreadPool::DispatchThread() 0x1b86014 kudu::Thread::SuperviseThread() 0x7f64f36a8e25 start_thread 0x7f64f176f34d __clone The thread 4016 above had acquired the MaintenanceManager::lock_ mutex and went calculating the scores for compaction candidates. Three other threads 48325, 48324, 48323 are waiting for the same mutex to be acquired upon returning from the MaintenanceManager::LaunchOp() method. These three maintenance threads were the only threads in the 'MaintenanceMgr' thread pool, i.e. no other threads were available to perform scheduled compaction operations. These three threads were blocked after performing scheduled maintenance operations, so they could not process any new compaction tasks already scheduled by the former thread. The more compaction candidates were there, the longer the maintenance threads were blocked waiting on the compaction score computation performed by the former thread. To relieve the contention, I updated the code to use separate mutexes for op-specific condition variables and the scheduler's condition variable. Now, op-specific condition variable uses the MaintenanceManager::running_instances_lock_, which is also used to guard access to the MaintenanceManager::running_instances_ container. This patch also fixes reporting on the duration of the compaction operations. Before this patch, the timings for compaction operations might be bloated in case of lock contention, especially in cases attributed to conditions resulting in the stacks like shown above. This patch doesn't contain a test to evaluate performance impact of this change: I'm planning to do so in a separate changelist. Change-Id: I63b12dd3641ef655f8fcbbad8d8ac515d874c0fb Reviewed-on: http://gerrit.cloudera.org:8080/16934 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong --- M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 2 files changed, 172 insertions(+), 130 deletions(-) Approvals: Kudu Jenkins: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16934 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I63b12dd3641ef655f8fcbbad8d8ac515d874c0fb Gerrit-Change-Number: 16934 Gerrit-PatchSet: 5 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [docker] Update the supported base images
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16877 ) Change subject: [docker] Update the supported base images .. [docker] Update the supported base images This patch updates the base images available to chose from in docker-build.py to match those listed on the installation documentation. Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5 Reviewed-on: http://gerrit.cloudera.org:8080/16877 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M docker/docker-build.py 1 file changed, 5 insertions(+), 6 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5 Gerrit-Change-Number: 16877 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [docker] Update the supported base images
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16877 ) Change subject: [docker] Update the supported base images .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py File docker/docker-build.py: http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py@67 PS3, Line 67: bionic > nit: I'm curious what's the criterion to use particular release out of avai So far we have been using the oldest available release as the default. I haven't really thought much about the choice. We can change it if you think we should in the future. -- To view, visit http://gerrit.cloudera.org:8080/16877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5 Gerrit-Change-Number: 16877 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 15:02:29 + Gerrit-HasComments: Yes
[kudu-CR] [java] allow disabling TLS transportation from java client
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/16887 ) Change subject: [java] allow disabling TLS transportation from java client .. Patch Set 5: > Patch Set 5: Code-Review-2 (changed my vote to -2 to make sure we don't accidentally submit) -- To view, visit http://gerrit.cloudera.org:8080/16887 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I762b9d6ef9257a7853684dd0aef315e5c5bba8ed Gerrit-Change-Number: 16887 Gerrit-PatchSet: 5 Gerrit-Owner: wangning <1994wangn...@gmail.com> Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: wangning <1994wangn...@gmail.com> Gerrit-Comment-Date: Mon, 11 Jan 2021 14:05:58 + Gerrit-HasComments: No
[kudu-CR] [java] allow disabling TLS transportation from java client
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/16887 ) Change subject: [java] allow disabling TLS transportation from java client .. Patch Set 5: Code-Review-2 -- To view, visit http://gerrit.cloudera.org:8080/16887 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I762b9d6ef9257a7853684dd0aef315e5c5bba8ed Gerrit-Change-Number: 16887 Gerrit-PatchSet: 5 Gerrit-Owner: wangning <1994wangn...@gmail.com> Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: wangning <1994wangn...@gmail.com> Gerrit-Comment-Date: Mon, 11 Jan 2021 14:05:33 + Gerrit-HasComments: No
[kudu-CR] [java] allow disabling TLS transportation from java client
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/16887 ) Change subject: [java] allow disabling TLS transportation from java client .. Patch Set 5: Code-Review-1 This is a nice feature, but right now we don't use many environment variables, especially on the client-side. I was also thinking about adding a similar config to *require* TLS as right now it's trivial to MITM and downgrade to clear text. If we used env vars, these two vars would collide. For these reasons, I'd rather we used code config which is a lot more robust and I feel like it's more widespread in the Java world. Env vars can still be added on the application-side if a user decides they want to control this behavior by env vars, but it's also possible to read the config from an XML or YAML config file as well if we opt to use code-configs. I realize this patch might not be the best place to discuss this, so maybe we should bring this up on dev@ instead. In the meantime, I'd recommend against merging this so I'm voting -1. -- To view, visit http://gerrit.cloudera.org:8080/16887 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I762b9d6ef9257a7853684dd0aef315e5c5bba8ed Gerrit-Change-Number: 16887 Gerrit-PatchSet: 5 Gerrit-Owner: wangning <1994wangn...@gmail.com> Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: wangning <1994wangn...@gmail.com> Gerrit-Comment-Date: Mon, 11 Jan 2021 14:05:06 + Gerrit-HasComments: No
[kudu-CR] KUDU-1644 use range partition info for pruning
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16903 ) Change subject: KUDU-1644 use range partition info for pruning .. KUDU-1644 use range partition info for pruning When pruning in-list predicate values, range partition can also be taken into consideration. Change-Id: I3f14f543cffd44f090026f344c9b06af13ea2e10 Reviewed-on: http://gerrit.cloudera.org:8080/16903 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong --- M src/kudu/common/partition.cc M src/kudu/common/partition.h M src/kudu/common/scan_spec-test.cc M src/kudu/common/scan_spec.cc M src/kudu/common/scan_spec.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tserver/tablet_service.cc 7 files changed, 662 insertions(+), 277 deletions(-) Approvals: Kudu Jenkins: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16903 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3f14f543cffd44f090026f344c9b06af13ea2e10 Gerrit-Change-Number: 16903 Gerrit-PatchSet: 5 Gerrit-Owner: wangning <1994wangn...@gmail.com> Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: wangning <1994wangn...@gmail.com>
[kudu-CR] [util] logging on slow execution in FindBestOp()
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16938 ) Change subject: [util] logging on slow execution in FindBestOp() .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc@395 PS1, Line 395: 3000 If you haven't already it might be a good idea to try loading up a single tserver with many tablets with many rowsets, and seeing the difference your contention patch makes. Curious how a tserver nearing the limit of the documented recommendations for tablet count and size would fare with respect to this timing value. -- To view, visit http://gerrit.cloudera.org:8080/16938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486 Gerrit-Change-Number: 16938 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 08:53:36 + Gerrit-HasComments: Yes
[kudu-CR] [util] fix another lock contention in MaintenanceManager
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16934 ) Change subject: [util] fix another lock contention in MaintenanceManager .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16934 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I63b12dd3641ef655f8fcbbad8d8ac515d874c0fb Gerrit-Change-Number: 16934 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 11 Jan 2021 08:49:04 + Gerrit-HasComments: No
[kudu-CR] [util] extra test for MaintenanceManager
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16937 ) Change subject: [util] extra test for MaintenanceManager .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16937/1/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/16937/1/src/kudu/util/maintenance_manager-test.cc@748 PS1, Line 748: MonoDelta::FromMilliseconds(i % 8 + 1) Is this longer sleep time important if the goal of the test is to test for when the stats computation is slow? http://gerrit.cloudera.org:8080/#/c/16937/1/src/kudu/util/maintenance_manager-test.cc@749 PS1, Line 749: MonoDelta::FromMicroseconds(5) >From the results in the commit message, there wasn't a big difference in this >reported timings. I'm curious whether you experimented with larger values, >especially given the issue at hand stems from larger update stats times from >computing the best compaction. -- To view, visit http://gerrit.cloudera.org:8080/16937 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9e71b213583c000d4809dcfc885c1d31b3bb9d5 Gerrit-Change-Number: 16937 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 11 Jan 2021 08:48:43 + Gerrit-HasComments: Yes