[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-11 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#5).

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by 
the leader only
..

(wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
A src/2.patch
A src/3.patch
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
20 files changed, 2,011 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/5
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 5
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [java] Add JAAS example to client API doc

2021-01-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16940 )

Change subject: [java] Add JAAS example to client API doc
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@201
PS1, Line 201: conntect
> nit: connect
I like the idea of showing the construction of the kudu client like below.


http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@211
PS1, Line 211: my-ap
> nit: not related to this patch, but how about naming this 'app-user' or som
+1



--
To view, visit http://gerrit.cloudera.org:8080/16940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b
Gerrit-Change-Number: 16940
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Jan 2021 04:01:47 +
Gerrit-HasComments: Yes


[kudu-CR] [master] KUDU-2181 Raft ChangeConfig request to remove a master

2021-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16936 )

Change subject: [master] KUDU-2181 Raft ChangeConfig request to remove a master
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16936/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16936/1//COMMIT_MSG@12
PS1, Line 12: tests
I'm curious what's the behavior of Kudu client when a master is removed 
underneath after the client has connected to the cluster.  Does it make sense 
to add such tests for both C++ and Java clients?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc
File src/kudu/master/dynamic_multi_master-test.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@254
PS1, Line 254: cluster
nit: it would be great to document the 'cluster' parameter as well in the 
comment above


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@938
PS1, Line 938: non_leader_master_idx
Does it matter whether it's leader or non-leader master index?  I guess if the 
functionality is controlled by feature flags, this should matter since the 
feature flags verification happens before any application-specific code is 
invoked.  As such, I guess here we should check both masters and see the same 
error, right?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@995
PS1, Line 995: It's possible there is a leadership change between querying for 
leader master and
 :   // sending the add master request to non-leader master and 
hence using ASSERT_EVENTUALLY.
> Is this not a concern in other tests? Would setting --leader_failure_max_mi
In tablet server tests there is another way to protect against this -- disable 
master elections and then issue an election request manually.  This might be an 
option here as well, I guess?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@1009
PS1, Line 1009: 
ASSERT_OK(cluster_->master_proxy(non_leader_master_idx)->RemoveMaster(req, 
, ));
> What if this succeeds because we happened to elect non_leader_master_idx as
Yep, I'm curious whether this might end up in removing one master, and then 
trying to remove the only one which is left?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@1019
PS1, Line 1019: TEST_F(DynamicMultiMasterTest, TestRemoveLeaderMaster) {
Does it make sense to add a simple scenario to verify that a single and the 
only master cannot be removed from a cluster?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.h
File src/kudu/master/master.h:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.h@131
PS1, Line 131: of
nit: of initiating ?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc
File src/kudu/master/master.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc@523
PS1, Line 523: if (HostPortFromPB(peer.last_known_addr()) == hp
Could there be more than one master satisfying this criterion?  If so, what 
should be the behavior -- just remove the first matching one?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc@532
PS1, Line 532: leader master
nit: I guess it might be not only the leader, right?  Maybe, just report that a 
master cannot remove itself from the config?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.proto@941
PS1, Line 941: HostPortPB
Why HostPostPB instead of UUID as an identifier of existing server is better in 
this context?



--
To view, visit http://gerrit.cloudera.org:8080/16936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I76c03b8850faef60b65f85184c0a4db7cc6759ee
Gerrit-Change-Number: 16936
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Jan 2021 03:50:02 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur

2021-01-11 Thread Grant Henke (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16941

to look at the new patch set (#2).

Change subject: KUDU-3205: Fix building scan tokens when tablet not found 
errors occur
..

KUDU-3205: Fix building scan tokens when tablet not found errors occur

When tablet not found errors occur AsyncKuduClient.invalidateTabletCache
is called which then calls RemoteTablet.removeTabletClient to remove
a server (by uuid) from the RemoteTablet tabletServers list. This means that
the RemoteTablet can have a replica returned from RemoteTablet.getReplicas
which has a server that is not returned by RemoteTablet.getTabletServersCopy.
This results in a NPE when building a scan token because there is no matching
server for the replica in the serverIndexMap.

To fix this I simply check if the serverIndex found via the serverIndexMap is
null and ignore that replica if it is. A better long term fix is likely to build
the server info from the replicas themself, or to change the RemoteTablet 
behavior
to blacklist servers instead of removing them from the actual tabletServers 
list, or
to also remove the replica itself when RemoteTablet.removeTabletClient is 
called.
I didn’t do any of these options now because they are all larger changes with a
wider potential impact.

Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
4 files changed, 55 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/16941/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c
Gerrit-Change-Number: 16941
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [util] add a few new metrics in MaintenanceManager

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16938 )

Change subject: [util] add a few new metrics in MaintenanceManager
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16938/2/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

PS2:
Would be great to get some basic test coverage for our new metrics here.


http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc@395
PS1, Line 395:  WAL
> Yup, I'm going to do so.  I'm also adding a few related metrics, like a his
Nice!


http://gerrit.cloudera.org:8080/#/c/16938/2/src/kudu/util/maintenance_manager_metrics.cc
File src/kudu/util/maintenance_manager_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/16938/2/src/kudu/util/maintenance_manager_metrics.cc@26
PS2, Line 26:   "Number of times when calling Prepare() on 
a maintenance "
:   "operation failed",
nit: it might be worth mentioning why Prepare() might fail, e.g. if an op 
collides with another running instance of that op.



--
To view, visit http://gerrit.cloudera.org:8080/16938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486
Gerrit-Change-Number: 16938
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Jan 2021 01:55:58 +
Gerrit-HasComments: Yes


[kudu-CR] [java] Add JAAS example to client API doc

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16940 )

Change subject: [java] Add JAAS example to client API doc
..


Patch Set 1:

(3 comments)

Thanks for putting this change together!

http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@201
PS1, Line 201: conntect
nit: connect

Alternatively, "Create the KuduClient" or somesuch? Or maybe copy the actual 
builder code from below? Might not be clear what "connect" means, since there's 
no public "connect" call.


http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@211
PS1, Line 211: my-ap
nit: not related to this patch, but how about naming this 'app-user' or 
something so it's clear it's a user. Should be obvious from the UGI docs, but 
still probably worth clearer naming.

Better yet, it'd be great if both of these examples used consistent 
user/principals. E.g. point the JAAS example to /path/to/app.keytab and use 
feasible example users/principals.


http://gerrit.cloudera.org:8080/#/c/16940/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@220
PS1, Line 220: will also automatically
 :  * start a thread to periodically re-login from the keytab.
Is anything equivalent done for JAAS? Do JAAS users need to worry about doing 
this themselves?



--
To view, visit http://gerrit.cloudera.org:8080/16940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b
Gerrit-Change-Number: 16940
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Jan 2021 01:49:07 +
Gerrit-HasComments: Yes


[kudu-CR] [master] KUDU-2181 Raft ChangeConfig request to remove a master

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16936 )

Change subject: [master] KUDU-2181 Raft ChangeConfig request to remove a master
..


Patch Set 1:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc
File src/kudu/master/dynamic_multi_master-test.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@346
PS1, Line 346:   if (master != HostPort()) {
 : *req.mutable_rpc_addr() = HostPortToPB(master);
 :   }
What does it mean for this to be called with an unset 'master' field? Would be 
good to add a comment.

Ah, made my way down to the test that uses this. Would it make sense to instead 
provide a garbage HostPort value from HostPort::ParseString?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@474
PS1, Line 474: TransferMasterLeadershipAndVerify
nit: "verify" is a bit overloaded, especially when in a file that can 
misconstrue it with using the ClusterVerifier. How about calling this 
TransferMasterLeadership and calling TransferMasterLeadership 
TransferMasterLeadershipAsync? Or better yet, have them be the same method, but 
with an optional bool parameter that tells the method to wait for success.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@787
PS1, Line 787: // When an ExternalMiniCluster is restarted after removal of a 
master then one of the
 :   // remaining masters can get reassigned to the same wal dir 
which was previously assigned
 :   // to the removed master. This causes problems during 
verification, so we always try to
 :   // remove the last master in terms of index for test purposes.
Does this problem go away if, rather than creating a new mini cluster, we just 
restart the old one after the call to RemoveMaster()?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@805
PS1, Line 805:   // We need at least 1 operation for the leader in current term 
before initiating ChangeConfig.
I thought this happens by replicating a no-op when first becoming leader. Or is 
this referring to something else?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@814
PS1, Line 814:ASSERT_EVENTUALLY([&] {
 :   VerifyDeadMasterInSpecifiedState(master_to_remove_uuid, 
consensus::HealthReportPB::UNKNOWN);
 : });
It seems possible we'd trigger a test failure if, by luck, the gaps in 
ASSERT_EVENTUALLY missed this state entirely. It'd be more robust to just 
assert we get to FAILED, given it's a terminal state.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@834
PS1, Line 834: ASSERT_FALSE(ContainsKey(actual_master_hps, master_to_remove));
nit: while I appreciate the dedicated checking, especially when it comes to 
longer tests like these, we can probably do without assertions that are easily 
verified by reading a handful of lines of nearby, relatively trivial code. In 
this case, this is redundant with the L833 check since we've removed 
`master_to_remove` from `expected_master_hps`.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@853
PS1, Line 853:   cluster_.reset();
Does

 cluster_->Shutdown();
 ASSERT_OK(cluster_->Restart());

not work? I think it'd be much less surprising to further users of the EMC who 
may want to dynamically update the masters.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@930
PS1, Line 930:  /* Omitting "--master_support_change_config" */
nit: maybe just set it to false now so we don't have to update this test when 
we enable it by default?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@995
PS1, Line 995: It's possible there is a leadership change between querying for 
leader master and
 :   // sending the add master request to non-leader master and 
hence using ASSERT_EVENTUALLY.
Is this not a concern in other tests? Would setting 
--leader_failure_max_missed_heartbeat_periods to something very high help avoid 
this case?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@1009
PS1, Line 1009: 
ASSERT_OK(cluster_->master_proxy(non_leader_master_idx)->RemoveMaster(req, 
, ));
What if this succeeds because we happened to elect non_leader_master_idx as 
leader? Wouldn't the call to VerifyNumMastersAndGetAddresses never succeed? Or 
wouldn't we be unable to successfully remove another Master?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc
File src/kudu/master/master.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc@532
PS1, Line 532: 

[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur

2021-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16941 )

Change subject: KUDU-3205: Fix building scan tokens when tablet not found 
errors occur
..


Patch Set 1: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@685
PS1, Line 685: if (serverIndex != null) {
nit: for readability, consider

if (serverIndex == null) {
  continue;
}

// business as usual


http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java:

http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java@263
PS1, Line 263:* Get replicas of this tablet. The returned list may not be 
mutated.
> nit: do you think it's worth adding a note here expressing that the replica
+1


http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java:

http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@732
PS1, Line 732: KuduScanToken.KuduScanTokenBuilder tokenBuilder
nit: maybe, add includeTabletMetadata(true) and includeTableMetadata(true) to 
be explicit what settings are important here, or maybe just add an ASSERT to 
make sure the builder or the generated token has those properties set.


http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@733
PS1, Line 733: tokens
Probably, that's not exactly related to this change, bust just in case: what 
happens in case of ReplicaSelection.CLOSEST_REPLICA mode and when the 
information on one of the replicas is missing in the token?  Should client try 
to re-resolve the location of replicas in that case because the new replica 
might be placed on the same node where the client runs?


http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java@739
PS1, Line 739:  tokens.get(0).intoScanner(newClient);
Does it matter what's the role of the replica removed?  Does it work as 
expected when a leader replica's metadata has been removed from the token, and 
the ReplicaSelection is set to ReplicaSelection.LEADER_ONLY?



--
To view, visit http://gerrit.cloudera.org:8080/16941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c
Gerrit-Change-Number: 16941
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Jan 2021 01:31:50 +
Gerrit-HasComments: Yes


[kudu-CR] [util] add a few new metrics in MaintenanceManager

2021-01-11 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16938

to look at the new patch set (#2).

Change subject: [util] add a few new metrics in MaintenanceManager
..

[util] add a few new metrics in MaintenanceManager

This patch adds a couple of metrics for MaintenanceManager to track the
duration of choosing the best candidate among available maintenance
operations and number of times the Prepare() method for a maintenance
operation failed:
  * maintenance_op_find_best_candidate_duration
  * maintenance_op_prepare_failed

In addition, it adds SCOPED_LOG_SLOW_EXECUTION with the threshold of
10 seconds into the MaintenanceManager::FindBestOp() method.

At this point, I manually verified that those metrics are present and
show relevant information.  I'm planning to add an automated test
to cover the behavior of these new metrics in [1] to have less conflicts
with the mentioned patch.

The motivation for this change is a finding that FindBestOp()'s
computational complexity is O(n^2) of the number of replicas per tablet
server (each tablet replica registers about 8 maintenance operations).
Also, BudgetedCompactionPolicy::RunApproximation()'s computational
complexity is O(n^2) of the number of rowset in max and min keys.
In the wild, there was an instance of a Kudu cluster with high data
ingest ratio with the following stack showing in every snapshot in the
diagnostic logs for many hours in a row:

 0xa11735 kudu::tablet::BudgetedCompactionPolicy::RunApproximation()
 0xa129c9 kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
 0x9c8d80 kudu::tablet::Tablet::UpdateCompactionStats()
 0x9ec848 kudu::tablet::CompactRowSetsOp::UpdateStats()
0x1b3de5c kudu::MaintenanceManager::FindBestOp()
0x1b3f3c5 kudu::MaintenanceManager::RunSchedulerThread()
0x1b86014 kudu::Thread::SuperviseThread()

[1] https://gerrit.cloudera.org/#/c/16937/

Change-Id: If5420afd605f9bd22207af142b49e73336907486
---
M src/kudu/master/master.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
A src/kudu/util/maintenance_manager_metrics.cc
A src/kudu/util/maintenance_manager_metrics.h
8 files changed, 138 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/38/16938/2
--
To view, visit http://gerrit.cloudera.org:8080/16938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486
Gerrit-Change-Number: 16938
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16941 )

Change subject: KUDU-3205: Fix building scan tokens when tablet not found 
errors occur
..


Patch Set 1: Code-Review+1

(1 comment)

LGTM, lint error aside.

http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java:

http://gerrit.cloudera.org:8080/#/c/16941/1/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java@263
PS1, Line 263:* Get replicas of this tablet. The returned list may not be 
mutated.
nit: do you think it's worth adding a note here expressing that the replicas 
may point at tablet servers not included in getTabletServersCopy(). It's added 
at the callsites, but it might be worth calling out closer to the method 
definitions too.



--
To view, visit http://gerrit.cloudera.org:8080/16941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c
Gerrit-Change-Number: 16941
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 23:56:47 +
Gerrit-HasComments: Yes


[kudu-CR] [build] Fix the java compatibility checker script

2021-01-11 Thread Grant Henke (Code Review)
Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13004 )

Change subject: [build] Fix the java compatibility checker script
..

[build] Fix the java compatibility checker script

This patch fixes check_compatibility.py. Primarily it
changed the existing script to use the Gradle build
in place of the Maven build that no longer exists.

It now uses japicmp in place of java_acc because
java_acc had issues unzipping the jars created by the
kudu build. Additionally japicmp’s filtering seemed to
work better.

The patch also updates the Gradle build to ensure
generated Protobuf classes are annotated with
@Generated to make report filtering easier.

In follow on patches I may incorporate this into the LINT job
and cause failure on API breakage.

Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3
Reviewed-on: http://gerrit.cloudera.org:8080/13004
Tested-by: Kudu Jenkins
Reviewed-by: Greg Solovyev 
Reviewed-by: Alexey Serbin 
---
M build-support/check_compatibility.py
M java/gradle/protobuf.gradle
2 files changed, 111 insertions(+), 91 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Greg Solovyev: Looks good to me, but someone else must approve
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/13004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3
Gerrit-Change-Number: 13004
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3205: Fix building scan tokens when tablet not found errors occur

2021-01-11 Thread Grant Henke (Code Review)
Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16941


Change subject: KUDU-3205: Fix building scan tokens when tablet not found 
errors occur
..

KUDU-3205: Fix building scan tokens when tablet not found errors occur

When tablet not found errors occur AsyncKuduClient.invalidateTabletCache
is called which then calls RemoteTablet.removeTabletClient to remove
a server (by uuid) from the RemoteTablet tabletServers list. This means that
the RemoteTablet can have a replica returned from RemoteTablet.getReplicas
which has a server that is not returned by RemoteTablet.getTabletServersCopy.
This results in a NPE when building a scan token because there is no matching
server for the replica in the serverIndexMap.

To fix this I simply check if the serverIndex found via the serverIndexMap is
null and ignore that replica if it is. A better long term fix is likely to build
the server info from the replicas themself, or to change the RemoteTablet 
behavior
to blacklist servers instead of removing them from the actual tabletServers 
list, or
to also remove the replica itself when RemoteTablet.removeTabletClient is 
called.
I didn’t do any of these options now because they are all larger changes with a
wider potential impact.

Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
4 files changed, 62 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/16941/1
--
To view, visit http://gerrit.cloudera.org:8080/16941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I68679dee1ad7ebca405dd6e086770f3e034e310c
Gerrit-Change-Number: 16941
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 


[kudu-CR] [build] Fix the java compatibility checker script

2021-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13004 )

Change subject: [build] Fix the java compatibility checker script
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3
Gerrit-Change-Number: 13004
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 21:54:00 +
Gerrit-HasComments: No


[kudu-CR] [java] Add JAAS example to client API doc

2021-01-11 Thread Attila Bukor (Code Review)
Hello Alexey Serbin, Andrew Wong, Grant Henke,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/16940

to review the following change.


Change subject: [java] Add JAAS example to client API doc
..

[java] Add JAAS example to client API doc

The AsyncKuduClient class documents how to connect to a secure Kudu
cluster using ticket cache or using a keytab when the application is
using hadoop-common libraries. We can't always rely on
UserGroupInformation though, so this commit expands the keytab-based
login case by a JAAS config example.

It also fixes a formatting issue with the "@Override" in the code, as it
wasn't properly rendered.

Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
1 file changed, 34 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/16940/1
--
To view, visit http://gerrit.cloudera.org:8080/16940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7db5bf67727f7318272de7ea2156a33255a2dc8b
Gerrit-Change-Number: 16940
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 


[kudu-CR] [build] Fix the java compatibility checker script

2021-01-11 Thread Greg Solovyev (Code Review)
Greg Solovyev has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13004 )

Change subject: [build] Fix the java compatibility checker script
..


Patch Set 4: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py
File build-support/check_compatibility.py:

http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@121
PS3, Line 121: def find_client_jars(path):
> I exclude jars because they are know to be non-public or application jars w
Done


http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@175
PS3, Line 175:   if src_name == dst_name:
> Currently the script uses super rudimentary arguments. I will enhance the a
Done



--
To view, visit http://gerrit.cloudera.org:8080/13004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3
Gerrit-Change-Number: 13004
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 18:28:37 +
Gerrit-HasComments: Yes


[kudu-CR] [build] Fix the java compatibility checker script

2021-01-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13004 )

Change subject: [build] Fix the java compatibility checker script
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13004/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13004/3//COMMIT_MSG@22
PS3, Line 22: incorparate
> incorporate ?
Done


http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py
File build-support/check_compatibility.py:

http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@109
PS3, Line 109:   if os.path.exists(get_japicmp_path()):
 : logging.info("japicmp is already downloaded.")
> Will it work as expected when a new version of the tool is available?  Mayb
Done


http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@115
PS3, Line 115: "curl", "-o"
> nit: does it make sense to add few options to match the behavior of downloa
Done


http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@121
PS3, Line 121:   return [j for j in all_jars if (
> Nit: if I am reading this correctly, this code is excluding 9 patterns in o
I exclude jars because they are know to be non-public or application jars which 
do not have public API. I prefer a exclusion to explicit inclusion so that by 
default we scan and check APIs if a new jar is added.


http://gerrit.cloudera.org:8080/#/c/13004/3/build-support/check_compatibility.py@175
PS3, Line 175:   # TODO(ghenke): Add support for 
--error-on-binary-incompatibility and --error-on-source-incompatibility.
> Currently, the script produces a lot of output for modified and added metho
Currently the script uses super rudimentary arguments. I will enhance the 
arguments in a follow on patch once I understand what types of options a 
scheduled job might need.



--
To view, visit http://gerrit.cloudera.org:8080/13004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3
Gerrit-Change-Number: 13004
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 17:13:04 +
Gerrit-HasComments: Yes


[kudu-CR] [build] Fix the java compatibility checker script

2021-01-11 Thread Grant Henke (Code Review)
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Greg Solovyev, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13004

to look at the new patch set (#4).

Change subject: [build] Fix the java compatibility checker script
..

[build] Fix the java compatibility checker script

This patch fixes check_compatibility.py. Primarily it
changed the existing script to use the Gradle build
in place of the Maven build that no longer exists.

It now uses japicmp in place of java_acc because
java_acc had issues unzipping the jars created by the
kudu build. Additionally japicmp’s filtering seemed to
work better.

The patch also updates the Gradle build to ensure
generated Protobuf classes are annotated with
@Generated to make report filtering easier.

In follow on patches I may incorporate this into the LINT job
and cause failure on API breakage.

Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3
---
M build-support/check_compatibility.py
M java/gradle/protobuf.gradle
2 files changed, 111 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/13004/4
--
To view, visit http://gerrit.cloudera.org:8080/13004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id482da0b3aa5ea35fcfb2b674f0d7ae413045ca3
Gerrit-Change-Number: 13004
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [docker] Update the supported base images

2021-01-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16877 )

Change subject: [docker] Update the supported base images
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py
File docker/docker-build.py:

http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py@67
PS3, Line 67: bionic
> I'm not sure that's true -- in the prior version of this file, the DEFAULT_
ah, true. I suppose it may have been more arbitrary. I am happy to update it to 
be focal if you think that's a better choice.



--
To view, visit http://gerrit.cloudera.org:8080/16877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5
Gerrit-Change-Number: 16877
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 16:30:49 +
Gerrit-HasComments: Yes


[kudu-CR] [util] logging on slow execution in FindBestOp()

2021-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16938 )

Change subject: [util] logging on slow execution in FindBestOp()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc@395
PS1, Line 395: 3000
> If you haven't already it might be a good idea to try loading up a single t
Yup, I'm going to do so.  I'm also adding a few related metrics, like a 
histogram of how much time of FindBestOp() is taking.



--
To view, visit http://gerrit.cloudera.org:8080/16938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486
Gerrit-Change-Number: 16938
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 16:13:34 +
Gerrit-HasComments: Yes


[kudu-CR] [docker] Update the supported base images

2021-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16877 )

Change subject: [docker] Update the supported base images
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py
File docker/docker-build.py:

http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py@67
PS3, Line 67: bionic
> So far we have been using the oldest available release as the default. I ha
I'm not sure that's true -- in the prior version of this file, the DEFAULT_OS 
was set to 'ubuntu:xenial', but the list of available releases for Ubuntu was 
'ubuntu:trusty', 'ubuntu:xenial', 'ubuntu:bionic' and the oldest one among 
those was 'ubuntu:trusty, not 'ubuntu:xenial'.



--
To view, visit http://gerrit.cloudera.org:8080/16877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5
Gerrit-Change-Number: 16877
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 16:11:55 +
Gerrit-HasComments: Yes


[kudu-CR] [util] fix another lock contention in MaintenanceManager

2021-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16934 )

Change subject: [util] fix another lock contention in MaintenanceManager
..

[util] fix another lock contention in MaintenanceManager

I had a chance to look at stack traces in tservers' diagnostic log files
at a Kudu cluster with high data ingest ratio.  There were many stack
snapshots captured periodically every 30 seconds, but the pattern below
presented in every snapshot in a row for several hours.

  tids=[4016]
  0x7f64f36b05e0 
0xa116c6 kudu::tablet::BudgetedCompactionPolicy::RunApproximation()
0xa129c9 kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
0x9c8d80 kudu::tablet::Tablet::UpdateCompactionStats()
0x9ec848 kudu::tablet::CompactRowSetsOp::UpdateStats()
   0x1b3de5c kudu::MaintenanceManager::FindBestOp()
   0x1b3f3c5 kudu::MaintenanceManager::RunSchedulerThread()
   0x1b86014 kudu::Thread::SuperviseThread()
  0x7f64f36a8e25 start_thread
  0x7f64f176f34d __clone
  tids=[48325,48324,48323]
  0x7f64f36b05e0 
  0x7f64f36af42b __lll_lock_wait
  0x7f64f36aadcb _L_lock_812
  0x7f64f36aac98 __GI___pthread_mutex_lock
   0x1b546fd kudu::Mutex::Acquire()
   0x1b42913 kudu::MaintenanceManager::LaunchOp()
   0x1b929cd kudu::FunctionRunnable::Run()
   0x1b8fa87 kudu::ThreadPool::DispatchThread()
   0x1b86014 kudu::Thread::SuperviseThread()
  0x7f64f36a8e25 start_thread
  0x7f64f176f34d __clone

The thread 4016 above had acquired the MaintenanceManager::lock_ mutex
and went calculating the scores for compaction candidates.  Three other
threads 48325, 48324, 48323 are waiting for the same mutex to be
acquired upon returning from the MaintenanceManager::LaunchOp() method.
These three maintenance threads were the only threads in the
'MaintenanceMgr' thread pool, i.e. no other threads were available to
perform scheduled compaction operations.  These three threads were
blocked after performing scheduled maintenance operations, so they could
not process any new compaction tasks already scheduled by the former
thread.  The more compaction candidates were there, the longer the
maintenance threads were blocked waiting on the compaction score
computation performed by the former thread.

To relieve the contention, I updated the code to use separate mutexes
for op-specific condition variables and the scheduler's condition
variable.  Now, op-specific condition variable uses the
MaintenanceManager::running_instances_lock_, which is also used to guard
access to the MaintenanceManager::running_instances_ container.

This patch also fixes reporting on the duration of the compaction
operations.  Before this patch, the timings for compaction operations
might be bloated in case of lock contention, especially in cases
attributed to conditions resulting in the stacks like shown above.

This patch doesn't contain a test to evaluate performance impact of
this change: I'm planning to do so in a separate changelist.

Change-Id: I63b12dd3641ef655f8fcbbad8d8ac515d874c0fb
Reviewed-on: http://gerrit.cloudera.org:8080/16934
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong 
---
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
2 files changed, 172 insertions(+), 130 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I63b12dd3641ef655f8fcbbad8d8ac515d874c0fb
Gerrit-Change-Number: 16934
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [docker] Update the supported base images

2021-01-11 Thread Grant Henke (Code Review)
Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16877 )

Change subject: [docker] Update the supported base images
..

[docker] Update the supported base images

This patch updates the base images available to chose from in
docker-build.py to match those listed on the installation documentation.

Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5
Reviewed-on: http://gerrit.cloudera.org:8080/16877
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M docker/docker-build.py
1 file changed, 5 insertions(+), 6 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5
Gerrit-Change-Number: 16877
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [docker] Update the supported base images

2021-01-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16877 )

Change subject: [docker] Update the supported base images
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py
File docker/docker-build.py:

http://gerrit.cloudera.org:8080/#/c/16877/3/docker/docker-build.py@67
PS3, Line 67: bionic
> nit: I'm curious what's the criterion to use particular release out of avai
So far we have been using the oldest available release as the default. I 
haven't really thought much about the choice. We can change it if you think we 
should in the future.



--
To view, visit http://gerrit.cloudera.org:8080/16877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80b90f77f803a503f63f616a61080dcde4ee35a5
Gerrit-Change-Number: 16877
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 15:02:29 +
Gerrit-HasComments: Yes


[kudu-CR] [java] allow disabling TLS transportation from java client

2021-01-11 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16887 )

Change subject: [java] allow disabling TLS transportation from java client
..


Patch Set 5:

> Patch Set 5: Code-Review-2

(changed my vote to -2 to make sure we don't accidentally submit)


--
To view, visit http://gerrit.cloudera.org:8080/16887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I762b9d6ef9257a7853684dd0aef315e5c5bba8ed
Gerrit-Change-Number: 16887
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <1994wangn...@gmail.com>
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: wangning <1994wangn...@gmail.com>
Gerrit-Comment-Date: Mon, 11 Jan 2021 14:05:58 +
Gerrit-HasComments: No


[kudu-CR] [java] allow disabling TLS transportation from java client

2021-01-11 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16887 )

Change subject: [java] allow disabling TLS transportation from java client
..


Patch Set 5: Code-Review-2


--
To view, visit http://gerrit.cloudera.org:8080/16887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I762b9d6ef9257a7853684dd0aef315e5c5bba8ed
Gerrit-Change-Number: 16887
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <1994wangn...@gmail.com>
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: wangning <1994wangn...@gmail.com>
Gerrit-Comment-Date: Mon, 11 Jan 2021 14:05:33 +
Gerrit-HasComments: No


[kudu-CR] [java] allow disabling TLS transportation from java client

2021-01-11 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16887 )

Change subject: [java] allow disabling TLS transportation from java client
..


Patch Set 5: Code-Review-1

This is a nice feature, but right now we don't use many environment variables, 
especially on the client-side. I was also thinking about adding a similar 
config to *require* TLS as right now it's trivial to MITM and downgrade to 
clear text. If we used env vars, these two vars would collide.

For these reasons, I'd rather we used code config which is a lot more robust 
and I feel like it's more widespread in the Java world. Env vars can still be 
added on the application-side if a user decides they want to control this 
behavior by env vars, but it's also possible to read the config from an XML or 
YAML config file as well if we opt to use code-configs.

I realize this patch might not be the best place to discuss this, so maybe we 
should bring this up on dev@ instead. In the meantime, I'd recommend against 
merging this so I'm voting -1.


--
To view, visit http://gerrit.cloudera.org:8080/16887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I762b9d6ef9257a7853684dd0aef315e5c5bba8ed
Gerrit-Change-Number: 16887
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <1994wangn...@gmail.com>
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: wangning <1994wangn...@gmail.com>
Gerrit-Comment-Date: Mon, 11 Jan 2021 14:05:06 +
Gerrit-HasComments: No


[kudu-CR] KUDU-1644 use range partition info for pruning

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16903 )

Change subject: KUDU-1644 use range partition info for pruning
..

KUDU-1644 use range partition info for pruning

When pruning in-list predicate values, range partition can also be taken
into consideration.

Change-Id: I3f14f543cffd44f090026f344c9b06af13ea2e10
Reviewed-on: http://gerrit.cloudera.org:8080/16903
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong 
---
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tserver/tablet_service.cc
7 files changed, 662 insertions(+), 277 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16903
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3f14f543cffd44f090026f344c9b06af13ea2e10
Gerrit-Change-Number: 16903
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <1994wangn...@gmail.com>
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 
Gerrit-Reviewer: wangning <1994wangn...@gmail.com>


[kudu-CR] [util] logging on slow execution in FindBestOp()

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16938 )

Change subject: [util] logging on slow execution in FindBestOp()
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16938/1/src/kudu/util/maintenance_manager.cc@395
PS1, Line 395: 3000
If you haven't already it might be a good idea to try loading up a single 
tserver with many tablets with many rowsets, and seeing the difference your 
contention patch makes. Curious how a tserver nearing the limit of the 
documented recommendations for tablet count and size would fare with respect to 
this timing value.



--
To view, visit http://gerrit.cloudera.org:8080/16938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If5420afd605f9bd22207af142b49e73336907486
Gerrit-Change-Number: 16938
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 08:53:36 +
Gerrit-HasComments: Yes


[kudu-CR] [util] fix another lock contention in MaintenanceManager

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16934 )

Change subject: [util] fix another lock contention in MaintenanceManager
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63b12dd3641ef655f8fcbbad8d8ac515d874c0fb
Gerrit-Change-Number: 16934
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 11 Jan 2021 08:49:04 +
Gerrit-HasComments: No


[kudu-CR] [util] extra test for MaintenanceManager

2021-01-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16937 )

Change subject: [util] extra test for MaintenanceManager
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16937/1/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/16937/1/src/kudu/util/maintenance_manager-test.cc@748
PS1, Line 748: MonoDelta::FromMilliseconds(i % 8 + 1)
Is this longer sleep time important if the goal of the test is to test for when 
the stats computation is slow?


http://gerrit.cloudera.org:8080/#/c/16937/1/src/kudu/util/maintenance_manager-test.cc@749
PS1, Line 749: MonoDelta::FromMicroseconds(5)
>From the results in the commit message, there wasn't a big difference in this 
>reported timings. I'm curious whether you experimented with larger values, 
>especially given the issue at hand stems from larger update stats times from 
>computing the best compaction.



--
To view, visit http://gerrit.cloudera.org:8080/16937
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9e71b213583c000d4809dcfc885c1d31b3bb9d5
Gerrit-Change-Number: 16937
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 11 Jan 2021 08:48:43 +
Gerrit-HasComments: Yes