[kudu-CR] KUDU-2612 tablet servers automatically register txn participants

2021-02-26 Thread Alexey Serbin (Code Review)
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17037

to look at the new patch set (#12).

Change subject: KUDU-2612 tablet servers automatically register txn participants
..

KUDU-2612 tablet servers automatically register txn participants

With this patch, tablet servers automatically register tablets as
transaction participants and issue appropriate BEGIN_TXN operations
upon receiving write operations targeting tablet replicas which they
host.

Internally, the newly introduced logic is mostly embedded into the
TxnOpDispatcher class.  Every TxnOpDispatcher object is allowed to
accumulate up to a certain number of pending write requests in its queue
while awaiting for the completion of the preliminary tasks mentioned
above.  Once the TxnOpDispatcher's queue is at capacity, a tablet server
starts responding with the ErrorStatusPB::ERROR_SERVER_TOO_BUSY error
code to incoming write requests in the context of the corresponding
transaction, and Kudu clients automatically retry requests, as expected.
The capacity of the TxnOpDispatcher's queue is controlled by a newly
introduced --tablet_max_pending_txn_write_ops flag.  By default, the
flag is set to 2.  Buffering a few write requests should help to avoid
needless retries in case if a client packs many operations into one
write request: that's exactly the behavior for Kudu client sessions
using the AUTO_FLUSH_BACKGROUND mode.  If such buffering isn't desired
for some reason, set --tablet_max_pending_txn_write_ops=0: in that case
a client will retry the very first operation sent to a tablet server
in the context of a transaction until the tablet server completes the
preliminary tasks mentioned above.  The flag has runtime semantics,
so no restart of a tablet server is required upon modification of the
flag's setting.

Each tablet replica maintains a txn_id --> TxnOpDispatcher map,
with an entry's lifecycle as below:
  * An entry is added upon receiving write request in the context
of a multi-row transaction.
  * An entry is removed upon applying either ParticipantOpPB::ABORT_TXN
or ParticipantOpPB::FINALIZE_COMMIT operation.
  * If a write request is received after transaction has been committed
or aborted, the entry is automatically removed once receiving
corresponding error response from any of the following components:
  ** from TxnStatusManager in at attempts to register a participant
 in the context of committed/aborted transaction
  ** from the replica itself in an attempt to add
 ParticipantOpPB::BEGIN_COMMIT operation
In other words, the system automatically gets rid of no-longer-needed
TxnOpDispatcher entries.

This patch also contains several test scenarios to cover the newly
introduced functionality.  I also updated other related tests to remove
the artificial registration of corresponding transaction participants,
so those tests now rely on the newly introduced functionality.

Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
A src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
12 files changed, 2,285 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/17037/12
--
To view, visit http://gerrit.cloudera.org:8080/17037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6
Gerrit-Change-Number: 17037
Gerrit-PatchSet: 12
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612 add perf scenario to TxnWriteOpsITest

2021-02-26 Thread Alexey Serbin (Code Review)
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17105

to look at the new patch set (#3).

Change subject: KUDU-2612 add perf scenario to TxnWriteOpsITest
..

KUDU-2612 add perf scenario to TxnWriteOpsITest

This patch adds WriteOpPerf scenario to TxnWriteOpsITest.  The new
scenario is to evaluate --tablet_max_pending_txn_write_ops setting for
tablet servers: it runs for a short time to count number of completed
Write RPCs in context of a transactional session.  The scenario focuses
on single-row write operations to pinpoint the latency of processing
txn write operations when performing registration of transaction
participants.  Probably, the current locking approach for the
TxnOpDispatcher's queue while submitting the accumulated operations
isn't optimal.  Apparently, the exponential back-off timing built into
the client's RPC retry logic is also an important factor: I saw
significant deviation in the number of completed RPCs from run to run.

Below are results averaged for 100 runs of the benchmark scenario with
varying --max_pending_txn_write_ops accordingly and the following
settings fixed:
  --prepare_connections_to_tservers=true
  --clients=8
  --sessions_per_client=1
  --benchmark_run_time_ms=50

I used the following script to get the accumulated results for
writes in a transactional context:
  for i in {0..99}; do
./bin/txn_write_ops-itest --gtest_filter='*WriteOpPerf' \
--max_pending_txn_write_ops= 2>&1 | grep 'write RPCs' | \
awk '{print $9}'; done | \
awk 'BEGIN {sum=0} {sum += $0} END {print sum}'

RELEASE build:
  non-transactional  : 581.24 write RPCs
  --max_pending_txn_write_ops=0  : 442.13 txn write RPCs
  --max_pending_txn_write_ops=2  : 494.33 txn write RPCs
  --max_pending_txn_write_ops=5  : 471.90 txn write RPCs
  --max_pending_txn_write_ops=10 : 490.22 txn write RPCs
  --max_pending_txn_write_ops=20 : 469.21 txn write RPCs

DEBUG   build:
  non-transactional  : 150.24write RPCs
  --max_pending_txn_write_ops=0  : 83.74 txn write RPCs
  --max_pending_txn_write_ops=2  : 98.18 txn write RPCs
  --max_pending_txn_write_ops=5  : 95.23 txn write RPCs
  --max_pending_txn_write_ops=10 : 98.12 txn write RPCs
  --max_pending_txn_write_ops=20 : 94.40 txn write RPCs

Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4
---
M src/kudu/integration-tests/txn_write_ops-itest.cc
1 file changed, 190 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/17105/3
--
To view, visit http://gerrit.cloudera.org:8080/17105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4
Gerrit-Change-Number: 17105
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [multi-master-test] fix compilation warning

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17133


Change subject: [multi-master-test] fix compilation warning
..

[multi-master-test] fix compilation warning

Replaced INSTANTIATE_TEST_CASE_P with INSTANTIATE_TEST_SUITE_P
in dynamic_multi_master-test.cc to address compilation warning:

src/kudu/master/dynamic_multi_master-test.cc:833:1: \
  warning: 'InstantiateTestCase_P_IsDeprecated' is deprecated: \
  INSTANTIATE_TEST_CASE_P is deprecated, please use \
  INSTANTIATE_TEST_SUITE_P [-Wdeprecated-declarations]

Change-Id: Ibc2507082e57ada9aea1bdaa7fb0eb11c5e2ced2
---
M src/kudu/master/dynamic_multi_master-test.cc
1 file changed, 8 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/17133/1
--
To view, visit http://gerrit.cloudera.org:8080/17133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibc2507082e57ada9aea1bdaa7fb0eb11c5e2ced2
Gerrit-Change-Number: 17133
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 


[kudu-CR] KUDU-2612 tablet servers automatically register txn participants

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17037 )

Change subject: KUDU-2612 tablet servers automatically register txn participants
..


Patch Set 9:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

PS9:
> I think it's better to move these new test scenarios into txn_write_ops-ite
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc
File src/kudu/integration-tests/txn_write_ops-itest.cc:

PS9:
> Could you also add a test in which we delete a replica that has pending ops
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc@446
PS9, Line 446: GetSingleTxnOpDispatcher() {
> nit: indent by 4 spaces
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc@197
PS9, Line 197:   
RETURN_NOT_OK(tablet_replica_->UnregisterTxnOpDispatcher(txn_id()));
> Apply() cannot fail, see L234 for the CHECK_OK(). I suspect we should be ab
Exactly -- as we discussed offline, the proper place for this check is 
ParticipantOp::Prepare()

Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h@443
PS9, Line 443: let
> nit: give
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602
PS9, Line 602: this->RegisteredParticipantCb(txn_id, s);
> Answering my own question, no I don't think it does. That said, we could pr
Ack.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602
PS9, Line 602: this->RegisteredParticipantCb(txn_id, s);
> This gets set here and in the ParticipantTxnBeginCallback. Does that mean i
It's being called once, as far as I can see.

Anyway, it was messy due to iterative approach (I tried several options); I 
cleaned this up a bit.  Should be more straightforward now.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@607
PS9, Line 607: SetRegistrationScheduled
> SetPreliminaryTasksScheduled()?
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1179
PS9, Line 1179: TxnParticipantBegi
> nit: BeginTxnParticipantOp? Same in the callback?
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1208
PS9, Line 1208: CancelPendingTxnOps
> Is it possible for CancelPendingTxnOps to be called twice if BEGIN_TXN op f
Nope, it's called only once as I can see.  Even if it was called twice, the 
second time would be a no-op because the queue of pending ops was empty.

Anyways, I cleaned it up bit -- it should be more straightforward now.  I agree 
it was a bit messy in this revision.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1260
PS9, Line 1260: preliminary_tasks_scheduled_
> If the main purpose of this is to ensure we don't have multiple registratio
It would if it were guaranteed that the caller would
 (1) always try to schedule those tasks
 (2) always succeed in registering tasks if it tries to do so

(1) isn't guaranteed if looking at that from API perspective.  (2) isn't 
guaranteed because TabletReplica::SubmitTxnWrite() can fail with non-OK status. 
 I agree that the current approach doesn't look particularly good but telying 
on the logic of existing call sites would make even worse.

But I might be missing the essence of your question. Probably you meant just to 
call the scheduler functor here and remove that call-back from the upper level 
to set 'preliminary_tasks_scheduled_'?  I indeed removed the extra level of 
indirection, and now the scheduler function is called right from here.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1362
PS9, Line 1362: Cleanup
> nit: maybe call this RespondFailures() or something to make it more obvious
As for the holding the lock, I don't think it's needed here because this method 
isn't supposed to be called while inflight_status_ is to change concurrently.  
But it's a good call: it looks suspicious.  To make it cleaner, I made this 
method static.

Renamed into RespondWithStatus(), added DCHECK() for the status parameter at 
the call sites.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1166
PS9, Line 1166:
  :   if 
(PREDICT_FALSE(FLAGS_txn_participant_begin_op_inject_latency_ms > 0)) {
 

[kudu-CR] KUDU-2612 add perf scenario to TxnWriteOpsITest

2021-02-26 Thread Alexey Serbin (Code Review)
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17105

to look at the new patch set (#2).

Change subject: KUDU-2612 add perf scenario to TxnWriteOpsITest
..

KUDU-2612 add perf scenario to TxnWriteOpsITest

This patch adds TxnWriteOpPerf scenario to TxnWriteOpsITest.  The new
scenario is to evaluate --tablet_max_pending_txn_write_ops setting for
tablet servers: it runs for a short time to count number of completed
Write RPCs in context of a transactional session.  The scenario focuses
on single-row write operations to pinpoint the latency of processing
txn write operations when performing registration of transaction
participants.  Probably, the current locking approach for the
TxnOpDispatcher's queue while submitting the accumulated operations
isn't optimal.  Apparently, the exponential back-off timing built into
the client's RPC retry logic is also an important factor: I saw
significant deviation in the number of completed RPCs from run to run.

Below are results averaged for 100 runs of the benchmark scenario with
varying --max_pending_txn_write_ops accordingly and the following
settings fixed:
  --prepare_connections_to_tservers=true
  --clients=8
  --txn_sessions_per_client=1
  --benchmark_run_time_ms=50

I used the following script to get the accumulated results:
  for i in {0..99}; do
./bin/txn_write_ops-itest --gtest_filter='*TxnWriteOpPerf' \
--max_pending_txn_write_ops= 2>&1 | grep 'Txn write RPCs' | \
awk '{print $9}'; done | \
awk 'BEGIN {sum=0} {sum += $0} END {print sum}'

RELEASE build:
  --max_pending_txn_write_ops=0  : 442.13 Txn write RPCs
  --max_pending_txn_write_ops=2  : 494.33 Txn write RPCs
  --max_pending_txn_write_ops=5  : 471.90 Txn write RPCs
  --max_pending_txn_write_ops=10 : 490.22 Txn write RPCs
  --max_pending_txn_write_ops=20 : 469.21 Txn write RPCs

DEBUG   build:
  --max_pending_txn_write_ops=0  : 83.74 Txn write RPCs
  --max_pending_txn_write_ops=2  : 98.18 Txn write RPCs
  --max_pending_txn_write_ops=5  : 95.23 Txn write RPCs
  --max_pending_txn_write_ops=10 : 98.12 Txn write RPCs
  --max_pending_txn_write_ops=20 : 94.40 Txn write RPCs

Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4
---
M src/kudu/integration-tests/txn_write_ops-itest.cc
1 file changed, 191 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/17105/2
--
To view, visit http://gerrit.cloudera.org:8080/17105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4
Gerrit-Change-Number: 17105
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612 tablet servers automatically register txn participants

2021-02-26 Thread Alexey Serbin (Code Review)
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17037

to look at the new patch set (#11).

Change subject: KUDU-2612 tablet servers automatically register txn participants
..

KUDU-2612 tablet servers automatically register txn participants

With this patch, tablet servers automatically register tablets as
transaction participants and issue appropriate BEGIN_TXN operations
upon receiving write operations targeting tablet replicas which they
host.

Internally, the newly introduced logic is mostly embedded into the
TxnOpDispatcher class.  Every TxnOpDispatcher object is allowed to
accumulate up to a certain number of pending write requests in its queue
while awaiting for the completion of the preliminary tasks mentioned
above.  Once the TxnOpDispatcher's queue is at capacity, a tablet server
starts responding with the ErrorStatusPB::ERROR_SERVER_TOO_BUSY error
code to incoming write requests in the context of the corresponding
transaction, and Kudu clients automatically retry requests, as expected.
The capacity of the TxnOpDispatcher's queue is controlled by a newly
introduced --tablet_max_pending_txn_write_ops flag.  By default, the
flag is set to 2.  Buffering a few write requests should help to avoid
needless retries in case if a client packs many operations into one
write request: that's exactly the behavior for Kudu client sessions
using the AUTO_FLUSH_BACKGROUND mode.  If such buffering isn't desired
for some reason, set --tablet_max_pending_txn_write_ops=0: in that case
a client will retry the very first operation sent to a tablet server
in the context of a transaction until the tablet server completes the
preliminary tasks mentioned above.  The flag has runtime semantics,
so no restart of a tablet server is required upon modification of the
flag's setting.

Each tablet replica maintains a txn_id --> TxnOpDispatcher map,
with an entry's lifecycle as below:
  * An entry is added upon receiving write request in the context
of a multi-row transaction.
  * An entry is removed upon applying either ParticipantOpPB::ABORT_TXN
or ParticipantOpPB::FINALIZE_COMMIT operation.
  * If a write request is received after transaction has been committed
or aborted, the entry is automatically removed once receiving
corresponding error response from any of the following components:
  ** from TxnStatusManager in at attempts to register a participant
 in the context of committed/aborted transaction
  ** from the replica itself in an attempt to add
 ParticipantOpPB::BEGIN_COMMIT operation
In other words, the system automatically gets rid of no-longer-needed
TxnOpDispatcher entries.

This patch also contains several test scenarios to cover the newly
introduced functionality.  I also updated other related tests to remove
the artificial registration of corresponding transaction participants,
so those tests now rely on the newly introduced functionality.

Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6
---
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
A src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
12 files changed, 2,283 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/17037/11
--
To view, visit http://gerrit.cloudera.org:8080/17037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6
Gerrit-Change-Number: 17037
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612 add perf scenario to TxnWriteOpsITest

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17105 )

Change subject: KUDU-2612 add perf scenario to TxnWriteOpsITest
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG@15
PS1, Line 15: Probablty
> nit: Probably
Done


http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG@17
PS1, Line 17: Apparently,
> How did you observe this?
Ah, indeed: I didn't mention that I saw big deviation in the number of 
completed RPCs from run to run.  I added this notion.


http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG@35
PS1, Line 35: RELEASE build:
> I guess the main piece to conclude from this is that we should definitely p
Yup.  I guess there is some room for optimization.



--
To view, visit http://gerrit.cloudera.org:8080/17105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4
Gerrit-Change-Number: 17105
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 27 Feb 2021 06:20:09 +
Gerrit-HasComments: Yes


[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16752 )

Change subject: tablet: allow interleaving of row liveness between compaction 
input rows
..

tablet: allow interleaving of row liveness between compaction input rows

It was previously true that when merging multiple rowsets, a row's
liveness always moved to the same or newer rowsets. I.e., if a row was
deleted and inserted again, the new insert would either land in the same
rowset from which it was deleted (e.g. if deleting from the MRS and
inserting back into the same MRS), or a rowset whose rows are newer than
the rowset being deleted from (e.g. if deleting from a DRS and inserting
into the MRS).

This invariant was upheld by various checks in compaction code. The
invariant is no longer true when considering transactional inserts.
Take the following example:
- ts=10: insert 'a' to the tablet's MRS
- ts=11: delete 'a' from the tablet's MRS
- begin txn
  - insert 'a' to a transactional MRS
  - ts=12: commit the transactional MRS
- ts=13: delete 'a' from the transactional MRS
- ts=14: insert 'a' to the tablet's MRS

In considering the row history for 'a' in the context of defining the
row's history, we're left with the following row histories in each
rowset, which serve as an input to our history-merging code:

  tablet MRS: UNDO(del@10) <- 'a' -> REDO(del@11) -> REDO(reins@14)
  txn MRS:UNDO(del@12) <- 'a' -> REDO(del@13)

Previously, our invariant meant that one of these rows entire history
(both UNDOs and REDOs) was entirely ahead of the other. This meant that
merging was a matter of determining which input is older (which must be
a deleted "ghost" row, since there can only be a single live row at a
time), converting the older row and its history into UNDOs, and merging
that UNDO history with the newer row's UNDOs. Given the above sequence,
defining an older row isn't as straightforward, given the overlapping of
the histories. This led to broken invariants in the form of CHECK
failures or incorrect results.

However, a notable obvservation is that there _is_ a discernable history
that does abide by our previously held invariant, if we transfer the
newer REDOs from the tablet's MRS input to the txn's MRS input:

  UNDO(del@10) <- 'a' -> REDO(del@11)
  UNDO(del@12) <- 'a' -> REDO(del@13) -> REDO(reins@14)

This transferral is safe because we still expect that only a single
instance of a row can be "live" at a time. I.e., if there is such
overlap, it is caused by the deletion of a row from the older rowset,
and in transferring the history, there is no risk of ending up with two
histories for the same row that both end with a live row.

This patch implements this transferral of history onto the newer input,
and adds some fuzz test cases that helped snuff out the aforementioned
CHECK failures and incorrect results.

It also enables transactional ops in fuzz-itest, which helped find this
issue. Without this patch, fuzz-itest with transactional support fails
2/10 times in DEBUG mode. With it, it has passed 1000/1000 times.

Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Reviewed-on: http://gerrit.cloudera.org:8080/16752
Reviewed-by: Alexey Serbin 
Tested-by: Andrew Wong 
---
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/mutation.h
5 files changed, 608 insertions(+), 81 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Andrew Wong: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Gerrit-Change-Number: 16752
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16752 )

Change subject: tablet: allow interleaving of row liveness between compaction 
input rows
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Gerrit-Change-Number: 16752
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 27 Feb 2021 00:14:26 +
Gerrit-HasComments: No


[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16752 )

Change subject: tablet: allow interleaving of row liveness between compaction 
input rows
..


Patch Set 10:

Unrelated test failure:
TokenBasedConnectionITest.ReacquireAuthnToken


--
To view, visit http://gerrit.cloudera.org:8080/16752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Gerrit-Change-Number: 16752
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 27 Feb 2021 00:14:22 +
Gerrit-HasComments: No


[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has removed a vote on this change.

Change subject: tablet: allow interleaving of row liveness between compaction 
input rows
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/16752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Gerrit-Change-Number: 16752
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16752 )

Change subject: tablet: allow interleaving of row liveness between compaction 
input rows
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Gerrit-Change-Number: 16752
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 26 Feb 2021 23:08:32 +
Gerrit-HasComments: No


[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows

2021-02-26 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16752

to look at the new patch set (#10).

Change subject: tablet: allow interleaving of row liveness between compaction 
input rows
..

tablet: allow interleaving of row liveness between compaction input rows

It was previously true that when merging multiple rowsets, a row's
liveness always moved to the same or newer rowsets. I.e., if a row was
deleted and inserted again, the new insert would either land in the same
rowset from which it was deleted (e.g. if deleting from the MRS and
inserting back into the same MRS), or a rowset whose rows are newer than
the rowset being deleted from (e.g. if deleting from a DRS and inserting
into the MRS).

This invariant was upheld by various checks in compaction code. The
invariant is no longer true when considering transactional inserts.
Take the following example:
- ts=10: insert 'a' to the tablet's MRS
- ts=11: delete 'a' from the tablet's MRS
- begin txn
  - insert 'a' to a transactional MRS
  - ts=12: commit the transactional MRS
- ts=13: delete 'a' from the transactional MRS
- ts=14: insert 'a' to the tablet's MRS

In considering the row history for 'a' in the context of defining the
row's history, we're left with the following row histories in each
rowset, which serve as an input to our history-merging code:

  tablet MRS: UNDO(del@10) <- 'a' -> REDO(del@11) -> REDO(reins@14)
  txn MRS:UNDO(del@12) <- 'a' -> REDO(del@13)

Previously, our invariant meant that one of these rows entire history
(both UNDOs and REDOs) was entirely ahead of the other. This meant that
merging was a matter of determining which input is older (which must be
a deleted "ghost" row, since there can only be a single live row at a
time), converting the older row and its history into UNDOs, and merging
that UNDO history with the newer row's UNDOs. Given the above sequence,
defining an older row isn't as straightforward, given the overlapping of
the histories. This led to broken invariants in the form of CHECK
failures or incorrect results.

However, a notable obvservation is that there _is_ a discernable history
that does abide by our previously held invariant, if we transfer the
newer REDOs from the tablet's MRS input to the txn's MRS input:

  UNDO(del@10) <- 'a' -> REDO(del@11)
  UNDO(del@12) <- 'a' -> REDO(del@13) -> REDO(reins@14)

This transferral is safe because we still expect that only a single
instance of a row can be "live" at a time. I.e., if there is such
overlap, it is caused by the deletion of a row from the older rowset,
and in transferring the history, there is no risk of ending up with two
histories for the same row that both end with a live row.

This patch implements this transferral of history onto the newer input,
and adds some fuzz test cases that helped snuff out the aforementioned
CHECK failures and incorrect results.

It also enables transactional ops in fuzz-itest, which helped find this
issue. Without this patch, fuzz-itest with transactional support fails
2/10 times in DEBUG mode. With it, it has passed 1000/1000 times.

Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
---
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/mutation.h
5 files changed, 608 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16752/10
--
To view, visit http://gerrit.cloudera.org:8080/16752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Gerrit-Change-Number: 16752
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17128 )

Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel
..

[client-test] more robust TestRetrieveAuthzTokenInParallel

I saw a flakiness in the ClientTest.TestRetrieveAuthzTokenInParallel
scenario, where about 25 out of 256 runs failed when running with
--stress_cpu_threads=16:

  src/kudu/client/client-test.cc:6923: Failure
  Expected: (num_reqs) < (kThreads), actual: 21 vs 20

This patch addresses the issue by:
  * skipping the scenario when --stress_cpu_threads is not zero
  * skipping the scenario when running at single-core CPU machines
  * limiting the number of threads up to the number of CPU cores

Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7
Reviewed-on: http://gerrit.cloudera.org:8080/17128
Tested-by: Alexey Serbin 
Reviewed-by: Hao Hao 
Reviewed-by: Andrew Wong 
---
M src/kudu/client/client-test.cc
1 file changed, 15 insertions(+), 3 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Hao Hao: Looks good to me, approved
  Andrew Wong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17128
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7
Gerrit-Change-Number: 17128
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@15
PS3, Line 15: This is expected to be a good balance
> That single process is itself not distributed in nature at that point and I
Thank you for the explanation.

Right: my concern was about distributing the IO load for requests issued from 
one client application.  We also discussed this offline, and I agree that even 
with current implementation we didn't have any provision for better 
distribution of such IO load if looking at this from the point of using Kudu 
C++ client in impalad which is collocated with kudu-tserver, and impalad always 
worked with the local kudu-tserver.  So, from that perspective we cannot talk 
about regressions anyways.


http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191
PS3, Line 191: still benefit from spreading the load across replicas
> I didn't have a good reason to suspect that per-client would necessarily be
SGTM.

We also discussed this offline. Given that Impala is most likely the major user 
of Kudu C++ client, it seems the risk of exposing some API that nobody uses is 
higher than finding some unexpected use case where the former affinity behavior 
would be a better fit than the newly implemented one.



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 22:11:52 +
Gerrit-HasComments: Yes


[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@118
PS3, Line 118: assertEquals(0, quiesceTserver.waitFor());
 :
> why do we need to call waitFor() twice here?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 22:10:42 +
Gerrit-HasComments: Yes


[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17128 )

Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17128
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7
Gerrit-Change-Number: 17128
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 22:10:29 +
Gerrit-HasComments: No


[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

2021-02-26 Thread Andrew Wong (Code Review)
Hello Attila Bukor, Kudu Jenkins, Grant Henke, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17124

to look at the new patch set (#4).

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
..

[java] KUDU-3213: try at different server on TABLET_NOT_RUNNING

Prior to this patch, if a tablet server were quiescing for a prolonged
period, scan requests could time out, complaining that the tablet server
is quiescing, but without ever retrying the scan at another tablet
server. This is because tablet servers will return TABLET_NOT_RUNNING to
clients when attempting a scan while quiescing. The behavior in the C++
client is that the location is then blacklisted and the request is
retried elsewhere. The behavior in the Java client, though, is that the
same location is retried until failure.

This patch addresses this by treating TABLET_NOT_RUNNING errors in the
Java client as we would for TABLET_NOT_FOUND, which is actually quite
similar to the handling for TABLET_NOT_RUNNING in the C++ client: the
location is invalidated for further attempts, and the request is retried
elsewhere.

Why not just have quiescing tablet servers return TABLET_NOT_FOUND,
then? TABLET_NOT_FOUND errors in the C++ client actually have some
behavior not present in the Java client: a tablet whose location is
invalidated with TABLET_NOT_FOUND in the C++ client will be required to
be looked up again, requiring a round trip to the master. This behavior
doesn't exist in the Java client, so I thought it easiest to piggyback
on TABLET_NOT_FOUND handling for now.

Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
3 files changed, 56 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/4
--
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..

KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run

In testing an in-flight patch[1], it was found that our current handling of
errors when participant registration hasn't completed can lead to
incorrect behavior: the path in which we handle NOT_FOUND errors while
committing expects that such errors only occur if the tablet has been
deleted, and we currently proceed with the commit. The incorrect
assumption here was that NOT_FOUND errors are only produced when the
tablet has been deleted, which is not the case.

This behavior will be amended in an upcoming patch[2], and we'll
actually abort the transaction on NOT_FOUND errors. Until then, this
patch adjust the behavior to return ILLEGAL_STATE instead of NOT_FOUND,
so at least the error type is consistent with the rest of the
participant op lifecycle errors.

[1] https://gerrit.cloudera.org/c/17037/
[2] https://gerrit.cloudera.org/c/17022

Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Reviewed-on: http://gerrit.cloudera.org:8080/17127
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
Reviewed-by: Hao Hao 
---
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.h
2 files changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved
  Hao Hao: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Thanks!  This sounds like a reasonable plan, indeed.
Yep, sounds good. I'll merge this and follow it up then.



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 22:07:37 +
Gerrit-HasComments: Yes


[kudu-CR] k8s: minor updates to the StatefulSet

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17098 )

Change subject: k8s: minor updates to the StatefulSet
..

k8s: minor updates to the StatefulSet

I walked through some of the Kubernetes README and noticed the ports
were off. This updates to use the typical defaults we use for masters
and tservers.

It also updates to use 4 tservers, which is closer to what would be used
in production, given 3-4-3 replication.

Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611
Reviewed-on: http://gerrit.cloudera.org:8080/17098
Tested-by: Kudu Jenkins
Reviewed-by: Hao Hao 
Reviewed-by: Alexey Serbin 
---
M kubernetes/README.adoc
M kubernetes/kudu-services.yaml
M kubernetes/kudu-statefulset.yaml
3 files changed, 31 insertions(+), 25 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Hao Hao: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611
Gerrit-Change-Number: 17098
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] k8s: minor updates to the StatefulSet

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17098 )

Change subject: k8s: minor updates to the StatefulSet
..


Patch Set 1: Code-Review+2

Thank you for addressing these!


--
To view, visit http://gerrit.cloudera.org:8080/17098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611
Gerrit-Change-Number: 17098
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 21:58:05 +
Gerrit-HasComments: No


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17114 )

Change subject: test: add more natural test for KUDU-2233
..

test: add more natural test for KUDU-2233

I have a patch in-flight that touches an area of the code around where
we expect the infamous KUDU-2233 crash. Before merging it, I'd like to
ensure the graceful handling of this corruption is unaffected,
especially in cases where data has previously been corrupted and we've
just upgraded to a newer version of Kudu.

This patch adds such a test case, where data is corrupted but does not
result in a crash, and the tserver is restarted with bits that handle
corruption gracefully. In doing so, this patch also adds an off switch
to all of the guardrails we installed for KUDU-2233.

Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Reviewed-on: http://gerrit.cloudera.org:8080/17114
Reviewed-by: Alexey Serbin 
Tested-by: Andrew Wong 
---
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/timestamp_advancement-itest.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_replica.cc
7 files changed, 173 insertions(+), 22 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Andrew Wong: Verified

--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17114 )

Change subject: test: add more natural test for KUDU-2233
..


Patch Set 4: Verified+1

Unrelated failure of 
LocationAwareRebalancingParamTest.Rebalance/idx8_rf4_t8_l_3_1_1_1


--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 21:56:50 +
Gerrit-HasComments: No


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has removed a vote on this change.

Change subject: test: add more natural test for KUDU-2233
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 3: Code-Review+1

(3 comments)

Per the Slack discussion, would be good to get numbers making sure this 
improves things before merging if we can.

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@11
PS3, Line 11: ensure
nit: ensures


http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@12
PS3, Line 12: implimentation
nit: implementation


http://gerrit.cloudera.org:8080/#/c/17129/3/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java:

http://gerrit.cloudera.org:8080/#/c/17129/3/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java@66
PS3, Line 66:   private static final int kRandomSelectionInt = new 
Random().nextInt(Integer.MAX_VALUE);
nit: Google's Java styleguide suggests RANDOM_SELECTION_INT for constants. I 
suppose we're not that strict with following it though, given they suggest 
"logger" instead of "LOG"

https://google.github.io/styleguide/javaguide.html#s5-naming



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 21:56:27 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Ah great question. In that case, I think the correct course of action would
Thanks!  This sounds like a reasonable plan, indeed.

I guess we can keep this patch as is, and address the issue in a separate 
patch.  Whatever you think is best choice here.

Exactly: if the write request is retried and TxnOpDispatcher tries to register 
a participant and issue BEGIN_TXN for the participant again, it will fail at 
registering the participant.  Then TxnOpDispatcher will respond with 
appropriate error back to client, so the client will know that its write failed.



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 21:39:51 +
Gerrit-HasComments: Yes


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17114 )

Change subject: test: add more natural test for KUDU-2233
..


Patch Set 4: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77
PS2, Line 77: #ifndef NDEBUG
> I am fine with the current code, but if Alexey thinks differentiate the fla
Thank you!


http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77
PS2, Line 77: #ifndef NDEBUG
> Done
Apart from performance concern (I guess most likely the performance difference 
is negligible), I was not sure whether we want to keep a flag which does 
nothing.  It was not a major point anyways, so I'm also good with either 
keeping it or removing.



--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 21:25:30 +
Gerrit-HasComments: Yes


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17114 )

Change subject: test: add more natural test for KUDU-2233
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77
PS2, Line 77: #ifndef NDEBUG
> I am fine with the current code, but if Alexey thinks differentiate the fla
Done



--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:59:58 +
Gerrit-HasComments: Yes


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17114

to look at the new patch set (#4).

Change subject: test: add more natural test for KUDU-2233
..

test: add more natural test for KUDU-2233

I have a patch in-flight that touches an area of the code around where
we expect the infamous KUDU-2233 crash. Before merging it, I'd like to
ensure the graceful handling of this corruption is unaffected,
especially in cases where data has previously been corrupted and we've
just upgraded to a newer version of Kudu.

This patch adds such a test case, where data is corrupted but does not
result in a crash, and the tserver is restarted with bits that handle
corruption gracefully. In doing so, this patch also adds an off switch
to all of the guardrails we installed for KUDU-2233.

Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
---
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/timestamp_advancement-itest.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_replica.cc
7 files changed, 173 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/17114/4
--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel

2021-02-26 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17128 )

Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17128
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7
Gerrit-Change-Number: 17128
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:28:16 +
Gerrit-HasComments: No


[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

2021-02-26 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@118
PS3, Line 118: quiesceTserver.waitFor();
 : assertEquals(0, quiesceTserver.waitFor()
why do we need to call waitFor() twice here?



--
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:25:15 +
Gerrit-HasComments: Yes


[kudu-CR] k8s: minor updates to the StatefulSet

2021-02-26 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17098 )

Change subject: k8s: minor updates to the StatefulSet
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611
Gerrit-Change-Number: 17098
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:12:13 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@15
PS3, Line 15: This is expected to be a good balance
> How do we know?  What if a single process does a lot of full table scans?
That single process is itself not distributed in nature at that point and I 
would expect if it is doing multiple full scans of the same tablet it would be 
a good choice to pick the already warm tablet.

IMPALA-10481 describes why our current choice is suboptimal and this change 
aims to address that concern. We do plan to performance test and validate this 
change Impala side as well.

The one case I could see where a single process might want to use multiple 
tablet replicas is the case where the tablet splitting feature is used on the 
same process, though there hasn't been a lot of performance evaluation of that 
yet. IMPALA-9792 is still open tracking that evaluation from the Impala side 
(Which is the only known usage today).


http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191
PS3, Line 191: still benefit from spreading the load across replicas
> I meant we already have a recommendation to have just a single client per a
I didn't have a good reason to suspect that per-client would necessarily be a 
better choice than per process. Especially considering we often see a single 
client per process anyway.

The benefit of per process is that it matches the current java behavior and it 
means that multiple clients can be used without impacting the selection 
affinity on a given process. I could see a world where both is beneficial but 
didn't want to overcomplicate the change.

I think the "ultimate" fix would be to allow to user to provide their own 
"seed" for replica selection randomness. I think that could be a useful follow 
on change to track with a jira.



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:09:00 +
Gerrit-HasComments: Yes


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17114 )

Change subject: test: add more natural test for KUDU-2233
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77
PS2, Line 77: DEFINE_bool(dcheck_on_kudu_2233_invariants, true
> Yes, this one different as I can see: it's not quite used in NDEBUG case, w
I am fine with the current code, but if Alexey thinks differentiate the flag to 
be not exposed in NDEBUG case can help with the performance of execution. Then 
I am also good with that.



--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:03:04 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Right: I meant the case when a participant is registered, but BEGIN_TXN has
Ah great question. In that case, I think the correct course of action would be 
to treat the non-existent transaction on the participant as OK, since the end 
result is still that the transaction doesn't exist once the ABORT_TXN returns, 
and I can update this patch and my following to reflect that. Then, the 
question is what happens if the write is retried after that happens, and the 
answer there is that the new dispatcher's call to RegisterParticipant would 
fail.



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:02:43 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 19:56:59 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191
PS3, Line 191: still benefit from spreading the load across replicas
> Why not to have this per client?  I think that making it the same for all c
I meant we already have a recommendation to have just a single client per 
application.  So, if for some reason it it's necessary to escape from this 
affinity in the same application, keeping this per-process makes it impossible. 
 However, if this were an affinity per client instance, then in case if this 
affinity is not desired, at least there is a choice to instantiate multiple 
clients.

What do you think?



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 19:55:47 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@15
PS3, Line 15: This is expected to be a good balance
How do we know?  What if a single process does a lot of full table scans?


http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191
PS3, Line 191: still benefit from spreading the load across replicas
Why not to have this per client?  I think that making it the same for all 
clients in the same process is rather a restriction.



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 19:48:26 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@126
PS2, Line 126: ));
> nit: drop
Done


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134
PS2, Line 134:   while (1) {
> Better to move this close where it's first used in SelectTServer() function
Done


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134
PS2, Line 134:  {
> nit: kNamingConvention? per https://google.github.io/styleguide/cppguide.ht
Done


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134
PS2, Line 134:   whil
> nit: could also be const
Done


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2614
PS2, Line 2614: // Marking stale forces a lookup.
> nit: maybe comment that marking staleness forces a lookup?
Done


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2637
PS2, Line 2637: RemoteT
> Should be c2?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 19:39:56 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Grant Henke (Code Review)
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17129

to look at the new patch set (#3).

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..

KUDU-3248: Match C++ replica selection behavior of Java client

The C++ client currently uses a new random value every time
a replica is selected. Alternatively, the Java client uses a static
value that ensure the selection for a single process or application
remains deterministic. This patch adjusts the C++ implimentation
to match the Java client behavior.

This is expected to be a good balance of efficient use of cache
memory and distribution of load across the replicas given a
single process will always get the same choice resulting in
cache hits for follow up scans, but seperate processes
will get a potentially different random selection.

The reviews on the Java patch here have some good context
and discussion: https://gerrit.cloudera.org/#/c/12158/

Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M src/kudu/client/client-internal.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/common/common.proto
5 files changed, 86 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/17129/3
--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@126
PS2, Line 126: providing
nit: drop


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134
PS2, Line 134: static
nit: could also be const


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134
PS2, Line 134: random_selection_int
nit: kNamingConvention? per 
https://google.github.io/styleguide/cppguide.html#Constant_Names


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2614
PS2, Line 2614: rt->MarkStale();
nit: maybe comment that marking staleness forces a lookup?


http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2637
PS2, Line 2637: client_
Should be c2?



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 19:18:23 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134
PS2, Line 134: static int random_selection_int = InitRandomSelectionInt();
Better to move this close where it's first used in SelectTServer() function.
Also can further qualify with const.



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 19:10:58 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Grant Henke (Code Review)
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17129

to look at the new patch set (#2).

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..

KUDU-3248: Match C++ replica selection behavior of Java client

The C++ client currently uses a new random value every time
a replica is selected. Alternatively, the Java client uses a static
value that ensure the selection for a single process or application
remains deterministic. This patch adjusts the C++ implimentation
to match the Java client behavior.

This is expected to be a good balance of efficient use of cache
memory and distribution of load across the replicas given a
single process will always get the same choice resulting in
cache hits for follow up scans, but seperate processes
will get a potentially different random selection.

The reviews on the Java patch here have some good context
and discussion: https://gerrit.cloudera.org/#/c/12158/

Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M src/kudu/client/client-internal.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/common/common.proto
5 files changed, 84 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/17129/2
--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@124
PS1, Line 124: GoogleOnceType once;
 : // This random integer is used when making any random choice for 
replica
 : // selection. It is static to provide a deterministic selection 
for any given
 : // process and therefore also providing better cache affinity 
while ensuring
 : // that we can still benefit from spreading the load across 
replicas for
 : // other processes and applications.
 : int random_selection_int;
 :
 : void InitRandomSelectionInt() {
 :   std::random_device rdev;
 :   std::mt19937 gen(rdev());
 :   random_selection_int = gen();
 : }
 :
 : }
> Since this is client-internal.cc, C++11 and above is permitted so can use s
I will use static initialization as mentioned below.


http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@270
PS1, Line 270:   if (!local.empty()) {
 : ret = local[random_selection_int % local.size()];
 :   } else if (!same_location.empty()) {
 : ret = same_location[random_selection_int % 
same_location.size()];
 :   } else if (!filtered.empty()) {
 : ret = filtered[random_selection_int % filtered.size()];
 :   }
> Considering random_selection_int is used only in this function, std::once i
Awesome! I wasn't totally clear on that behavior given my lack of C++ expertise 
and wen't with what I saw other locations in the code base using. I am glad I 
can switch to something more straight forward.



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 18:53:30 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17129 )

Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@124
PS1, Line 124: GoogleOnceType once;
 : // This random integer is used when making any random choice for 
replica
 : // selection. It is static to provide a deterministic selection 
for any given
 : // process and therefore also providing better cache affinity 
while ensuring
 : // that we can still benefit from spreading the load across 
replicas for
 : // other processes and applications.
 : int random_selection_int;
 :
 : void InitRandomSelectionInt() {
 :   std::random_device rdev;
 :   std::mt19937 gen(rdev());
 :   random_selection_int = gen();
 : }
 :
 : }
Since this is client-internal.cc, C++11 and above is permitted so can use 
std::once instead. Already see unique_ptr and shared_ptr being used in this 
file.


http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@270
PS1, Line 270:   if (!local.empty()) {
 : ret = local[random_selection_int % local.size()];
 :   } else if (!same_location.empty()) {
 : ret = same_location[random_selection_int % 
same_location.size()];
 :   } else if (!filtered.empty()) {
 : ret = filtered[random_selection_int % filtered.size()];
 :   }
Considering random_selection_int is used only in this function, std::once is 
not necessary as static initialization is done just once and thread safe from 
C++11 onwards.
static int random_selection_int = InitRandomSelectionInt();



--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 18:45:54 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock

2021-02-26 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17097 )

Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock
..


Patch Set 3:

(2 comments)

Not a review. Just passing by...

http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h
File src/kudu/tablet/lock_manager.h:

http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h@19
PS3, Line 19: #include 
Nit: Can use the C++ compatible  instead.


http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h@157
PS3, Line 157: PartitionLock
I find it weird that a class named *Lock doesn't have any lock/unlock/release 
methods associated with it explicitly or implicitly and doesn't inherit or 
compose a lock primitive. This might as well be a typedef to TxnId.



--
To view, visit http://gerrit.cloudera.org:8080/17097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1
Gerrit-Change-Number: 17097
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 26 Feb 2021 18:23:07 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17097 )

Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock
..


Patch Set 3: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17097/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17097/3//COMMIT_MSG@12
PS3, Line 12: hold
nit: acquire?


http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager-test.cc
File src/kudu/tablet/lock_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager-test.cc@195
PS3, Line 195: threads.emplace_back([&] {
> warning: 'emplace_back' is called inside a loop; consider pre-allocating th
nit: since lock acquisition is quite fast operation, maybe it's worth 
synchronizing the start of the threads' activity by a barrier?


http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager-test.cc@208
PS3, Line 208: TEST_F(LockManagerTest, 
TestConcurrentLockUnlockPartitionWithInvalidID) {
nit: since lock acquisition is quite fast operation, maybe it's worth 
synchronizing the start of the threads' activity by a barrier?



--
To view, visit http://gerrit.cloudera.org:8080/17097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1
Gerrit-Change-Number: 17097
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 26 Feb 2021 18:07:44 +
Gerrit-HasComments: Yes


[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16752 )

Change subject: tablet: allow interleaving of row liveness between compaction 
input rows
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/16752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a
Gerrit-Change-Number: 16752
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 26 Feb 2021 16:56:41 +
Gerrit-HasComments: No


[kudu-CR] test: add more natural test for KUDU-2233

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17114 )

Change subject: test: add more natural test for KUDU-2233
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77
PS2, Line 77: DEFINE_bool(dcheck_on_kudu_2233_invariants, true
> I can surround this with some ifndef sure, but we do have several test-only
Yes, this one different as I can see: it's not quite used in NDEBUG case, while 
other test-only flags are used in both debug and NDEBUG case.

At lines 308-311 and 316-319 the usage of this flag reduces to an empty if() 
clause in NDEBUG mode.  Why to keep such a flag then in NDEBUG case?



--
To view, visit http://gerrit.cloudera.org:8080/17114
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa
Gerrit-Change-Number: 17114
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 16:56:12 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client

2021-02-26 Thread Grant Henke (Code Review)
Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17129


Change subject: KUDU-3248: Match C++ replica selection behavior of Java client
..

KUDU-3248: Match C++ replica selection behavior of Java client

The C++ client currently uses a new random value every time
a replica is selected. Alternatively, the Java client uses a static
value that ensure the selection for a single process or application
remains deterministic. This patch adjusts the C++ implimentation
to match the Java client behavior.

This is expected to be a good balance of efficient use of cache
memory and distribution of load across the replicas given a
single process will always get the same choice resulting in
cache hits for follow up scans, but seperate processes
will get a potentially different random selection.

The reviews on the Java patch here have some good context
and discussion: https://gerrit.cloudera.org/#/c/12158/

Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M src/kudu/client/client-internal.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/common/common.proto
5 files changed, 87 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/17129/1
--
To view, visit http://gerrit.cloudera.org:8080/17129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04
Gerrit-Change-Number: 17129
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke