[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17037 to look at the new patch set (#12). Change subject: KUDU-2612 tablet servers automatically register txn participants .. KUDU-2612 tablet servers automatically register txn participants With this patch, tablet servers automatically register tablets as transaction participants and issue appropriate BEGIN_TXN operations upon receiving write operations targeting tablet replicas which they host. Internally, the newly introduced logic is mostly embedded into the TxnOpDispatcher class. Every TxnOpDispatcher object is allowed to accumulate up to a certain number of pending write requests in its queue while awaiting for the completion of the preliminary tasks mentioned above. Once the TxnOpDispatcher's queue is at capacity, a tablet server starts responding with the ErrorStatusPB::ERROR_SERVER_TOO_BUSY error code to incoming write requests in the context of the corresponding transaction, and Kudu clients automatically retry requests, as expected. The capacity of the TxnOpDispatcher's queue is controlled by a newly introduced --tablet_max_pending_txn_write_ops flag. By default, the flag is set to 2. Buffering a few write requests should help to avoid needless retries in case if a client packs many operations into one write request: that's exactly the behavior for Kudu client sessions using the AUTO_FLUSH_BACKGROUND mode. If such buffering isn't desired for some reason, set --tablet_max_pending_txn_write_ops=0: in that case a client will retry the very first operation sent to a tablet server in the context of a transaction until the tablet server completes the preliminary tasks mentioned above. The flag has runtime semantics, so no restart of a tablet server is required upon modification of the flag's setting. Each tablet replica maintains a txn_id --> TxnOpDispatcher map, with an entry's lifecycle as below: * An entry is added upon receiving write request in the context of a multi-row transaction. * An entry is removed upon applying either ParticipantOpPB::ABORT_TXN or ParticipantOpPB::FINALIZE_COMMIT operation. * If a write request is received after transaction has been committed or aborted, the entry is automatically removed once receiving corresponding error response from any of the following components: ** from TxnStatusManager in at attempts to register a participant in the context of committed/aborted transaction ** from the replica itself in an attempt to add ParticipantOpPB::BEGIN_COMMIT operation In other words, the system automatically gets rid of no-longer-needed TxnOpDispatcher entries. This patch also contains several test scenarios to cover the newly introduced functionality. I also updated other related tests to remove the artificial registration of corresponding transaction participants, so those tests now rely on the newly introduced functionality. Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 --- M src/kudu/client/batcher.cc M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc A src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 12 files changed, 2,285 insertions(+), 127 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/17037/12 -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 12 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612 add perf scenario to TxnWriteOpsITest
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17105 to look at the new patch set (#3). Change subject: KUDU-2612 add perf scenario to TxnWriteOpsITest .. KUDU-2612 add perf scenario to TxnWriteOpsITest This patch adds WriteOpPerf scenario to TxnWriteOpsITest. The new scenario is to evaluate --tablet_max_pending_txn_write_ops setting for tablet servers: it runs for a short time to count number of completed Write RPCs in context of a transactional session. The scenario focuses on single-row write operations to pinpoint the latency of processing txn write operations when performing registration of transaction participants. Probably, the current locking approach for the TxnOpDispatcher's queue while submitting the accumulated operations isn't optimal. Apparently, the exponential back-off timing built into the client's RPC retry logic is also an important factor: I saw significant deviation in the number of completed RPCs from run to run. Below are results averaged for 100 runs of the benchmark scenario with varying --max_pending_txn_write_ops accordingly and the following settings fixed: --prepare_connections_to_tservers=true --clients=8 --sessions_per_client=1 --benchmark_run_time_ms=50 I used the following script to get the accumulated results for writes in a transactional context: for i in {0..99}; do ./bin/txn_write_ops-itest --gtest_filter='*WriteOpPerf' \ --max_pending_txn_write_ops= 2>&1 | grep 'write RPCs' | \ awk '{print $9}'; done | \ awk 'BEGIN {sum=0} {sum += $0} END {print sum}' RELEASE build: non-transactional : 581.24 write RPCs --max_pending_txn_write_ops=0 : 442.13 txn write RPCs --max_pending_txn_write_ops=2 : 494.33 txn write RPCs --max_pending_txn_write_ops=5 : 471.90 txn write RPCs --max_pending_txn_write_ops=10 : 490.22 txn write RPCs --max_pending_txn_write_ops=20 : 469.21 txn write RPCs DEBUG build: non-transactional : 150.24write RPCs --max_pending_txn_write_ops=0 : 83.74 txn write RPCs --max_pending_txn_write_ops=2 : 98.18 txn write RPCs --max_pending_txn_write_ops=5 : 95.23 txn write RPCs --max_pending_txn_write_ops=10 : 98.12 txn write RPCs --max_pending_txn_write_ops=20 : 94.40 txn write RPCs Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4 --- M src/kudu/integration-tests/txn_write_ops-itest.cc 1 file changed, 190 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/17105/3 -- To view, visit http://gerrit.cloudera.org:8080/17105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4 Gerrit-Change-Number: 17105 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [multi-master-test] fix compilation warning
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17133 Change subject: [multi-master-test] fix compilation warning .. [multi-master-test] fix compilation warning Replaced INSTANTIATE_TEST_CASE_P with INSTANTIATE_TEST_SUITE_P in dynamic_multi_master-test.cc to address compilation warning: src/kudu/master/dynamic_multi_master-test.cc:833:1: \ warning: 'InstantiateTestCase_P_IsDeprecated' is deprecated: \ INSTANTIATE_TEST_CASE_P is deprecated, please use \ INSTANTIATE_TEST_SUITE_P [-Wdeprecated-declarations] Change-Id: Ibc2507082e57ada9aea1bdaa7fb0eb11c5e2ced2 --- M src/kudu/master/dynamic_multi_master-test.cc 1 file changed, 8 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/17133/1 -- To view, visit http://gerrit.cloudera.org:8080/17133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibc2507082e57ada9aea1bdaa7fb0eb11c5e2ced2 Gerrit-Change-Number: 17133 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17037 ) Change subject: KUDU-2612 tablet servers automatically register txn participants .. Patch Set 9: (14 comments) http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS9: > I think it's better to move these new test scenarios into txn_write_ops-ite Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: PS9: > Could you also add a test in which we delete a replica that has pending ops Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc@446 PS9, Line 446: GetSingleTxnOpDispatcher() { > nit: indent by 4 spaces Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc@197 PS9, Line 197: RETURN_NOT_OK(tablet_replica_->UnregisterTxnOpDispatcher(txn_id())); > Apply() cannot fail, see L234 for the CHECK_OK(). I suspect we should be ab Exactly -- as we discussed offline, the proper place for this check is ParticipantOp::Prepare() Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h@443 PS9, Line 443: let > nit: give Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602 PS9, Line 602: this->RegisteredParticipantCb(txn_id, s); > Answering my own question, no I don't think it does. That said, we could pr Ack. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602 PS9, Line 602: this->RegisteredParticipantCb(txn_id, s); > This gets set here and in the ParticipantTxnBeginCallback. Does that mean i It's being called once, as far as I can see. Anyway, it was messy due to iterative approach (I tried several options); I cleaned this up a bit. Should be more straightforward now. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@607 PS9, Line 607: SetRegistrationScheduled > SetPreliminaryTasksScheduled()? Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1179 PS9, Line 1179: TxnParticipantBegi > nit: BeginTxnParticipantOp? Same in the callback? Done http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1208 PS9, Line 1208: CancelPendingTxnOps > Is it possible for CancelPendingTxnOps to be called twice if BEGIN_TXN op f Nope, it's called only once as I can see. Even if it was called twice, the second time would be a no-op because the queue of pending ops was empty. Anyways, I cleaned it up bit -- it should be more straightforward now. I agree it was a bit messy in this revision. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1260 PS9, Line 1260: preliminary_tasks_scheduled_ > If the main purpose of this is to ensure we don't have multiple registratio It would if it were guaranteed that the caller would (1) always try to schedule those tasks (2) always succeed in registering tasks if it tries to do so (1) isn't guaranteed if looking at that from API perspective. (2) isn't guaranteed because TabletReplica::SubmitTxnWrite() can fail with non-OK status. I agree that the current approach doesn't look particularly good but telying on the logic of existing call sites would make even worse. But I might be missing the essence of your question. Probably you meant just to call the scheduler functor here and remove that call-back from the upper level to set 'preliminary_tasks_scheduled_'? I indeed removed the extra level of indirection, and now the scheduler function is called right from here. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1362 PS9, Line 1362: Cleanup > nit: maybe call this RespondFailures() or something to make it more obvious As for the holding the lock, I don't think it's needed here because this method isn't supposed to be called while inflight_status_ is to change concurrently. But it's a good call: it looks suspicious. To make it cleaner, I made this method static. Renamed into RespondWithStatus(), added DCHECK() for the status parameter at the call sites. http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1166 PS9, Line 1166: : if (PREDICT_FALSE(FLAGS_txn_participant_begin_op_inject_latency_ms > 0)) {
[kudu-CR] KUDU-2612 add perf scenario to TxnWriteOpsITest
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17105 to look at the new patch set (#2). Change subject: KUDU-2612 add perf scenario to TxnWriteOpsITest .. KUDU-2612 add perf scenario to TxnWriteOpsITest This patch adds TxnWriteOpPerf scenario to TxnWriteOpsITest. The new scenario is to evaluate --tablet_max_pending_txn_write_ops setting for tablet servers: it runs for a short time to count number of completed Write RPCs in context of a transactional session. The scenario focuses on single-row write operations to pinpoint the latency of processing txn write operations when performing registration of transaction participants. Probably, the current locking approach for the TxnOpDispatcher's queue while submitting the accumulated operations isn't optimal. Apparently, the exponential back-off timing built into the client's RPC retry logic is also an important factor: I saw significant deviation in the number of completed RPCs from run to run. Below are results averaged for 100 runs of the benchmark scenario with varying --max_pending_txn_write_ops accordingly and the following settings fixed: --prepare_connections_to_tservers=true --clients=8 --txn_sessions_per_client=1 --benchmark_run_time_ms=50 I used the following script to get the accumulated results: for i in {0..99}; do ./bin/txn_write_ops-itest --gtest_filter='*TxnWriteOpPerf' \ --max_pending_txn_write_ops= 2>&1 | grep 'Txn write RPCs' | \ awk '{print $9}'; done | \ awk 'BEGIN {sum=0} {sum += $0} END {print sum}' RELEASE build: --max_pending_txn_write_ops=0 : 442.13 Txn write RPCs --max_pending_txn_write_ops=2 : 494.33 Txn write RPCs --max_pending_txn_write_ops=5 : 471.90 Txn write RPCs --max_pending_txn_write_ops=10 : 490.22 Txn write RPCs --max_pending_txn_write_ops=20 : 469.21 Txn write RPCs DEBUG build: --max_pending_txn_write_ops=0 : 83.74 Txn write RPCs --max_pending_txn_write_ops=2 : 98.18 Txn write RPCs --max_pending_txn_write_ops=5 : 95.23 Txn write RPCs --max_pending_txn_write_ops=10 : 98.12 Txn write RPCs --max_pending_txn_write_ops=20 : 94.40 Txn write RPCs Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4 --- M src/kudu/integration-tests/txn_write_ops-itest.cc 1 file changed, 191 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/17105/2 -- To view, visit http://gerrit.cloudera.org:8080/17105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4 Gerrit-Change-Number: 17105 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612 tablet servers automatically register txn participants
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17037 to look at the new patch set (#11). Change subject: KUDU-2612 tablet servers automatically register txn participants .. KUDU-2612 tablet servers automatically register txn participants With this patch, tablet servers automatically register tablets as transaction participants and issue appropriate BEGIN_TXN operations upon receiving write operations targeting tablet replicas which they host. Internally, the newly introduced logic is mostly embedded into the TxnOpDispatcher class. Every TxnOpDispatcher object is allowed to accumulate up to a certain number of pending write requests in its queue while awaiting for the completion of the preliminary tasks mentioned above. Once the TxnOpDispatcher's queue is at capacity, a tablet server starts responding with the ErrorStatusPB::ERROR_SERVER_TOO_BUSY error code to incoming write requests in the context of the corresponding transaction, and Kudu clients automatically retry requests, as expected. The capacity of the TxnOpDispatcher's queue is controlled by a newly introduced --tablet_max_pending_txn_write_ops flag. By default, the flag is set to 2. Buffering a few write requests should help to avoid needless retries in case if a client packs many operations into one write request: that's exactly the behavior for Kudu client sessions using the AUTO_FLUSH_BACKGROUND mode. If such buffering isn't desired for some reason, set --tablet_max_pending_txn_write_ops=0: in that case a client will retry the very first operation sent to a tablet server in the context of a transaction until the tablet server completes the preliminary tasks mentioned above. The flag has runtime semantics, so no restart of a tablet server is required upon modification of the flag's setting. Each tablet replica maintains a txn_id --> TxnOpDispatcher map, with an entry's lifecycle as below: * An entry is added upon receiving write request in the context of a multi-row transaction. * An entry is removed upon applying either ParticipantOpPB::ABORT_TXN or ParticipantOpPB::FINALIZE_COMMIT operation. * If a write request is received after transaction has been committed or aborted, the entry is automatically removed once receiving corresponding error response from any of the following components: ** from TxnStatusManager in at attempts to register a participant in the context of committed/aborted transaction ** from the replica itself in an attempt to add ParticipantOpPB::BEGIN_COMMIT operation In other words, the system automatically gets rid of no-longer-needed TxnOpDispatcher entries. This patch also contains several test scenarios to cover the newly introduced functionality. I also updated other related tests to remove the artificial registration of corresponding transaction participants, so those tests now rely on the newly introduced functionality. Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 --- M src/kudu/client/batcher.cc M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/txn_commit-itest.cc A src/kudu/integration-tests/txn_write_ops-itest.cc M src/kudu/tablet/ops/participant_op.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 12 files changed, 2,283 insertions(+), 127 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/17037/11 -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 11 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612 add perf scenario to TxnWriteOpsITest
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17105 ) Change subject: KUDU-2612 add perf scenario to TxnWriteOpsITest .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG@15 PS1, Line 15: Probablty > nit: Probably Done http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG@17 PS1, Line 17: Apparently, > How did you observe this? Ah, indeed: I didn't mention that I saw big deviation in the number of completed RPCs from run to run. I added this notion. http://gerrit.cloudera.org:8080/#/c/17105/1//COMMIT_MSG@35 PS1, Line 35: RELEASE build: > I guess the main piece to conclude from this is that we should definitely p Yup. I guess there is some room for optimization. -- To view, visit http://gerrit.cloudera.org:8080/17105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0370dbb289a4e1cfc154205ae92e13da510682b4 Gerrit-Change-Number: 17105 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 27 Feb 2021 06:20:09 + Gerrit-HasComments: Yes
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16752 ) Change subject: tablet: allow interleaving of row liveness between compaction input rows .. tablet: allow interleaving of row liveness between compaction input rows It was previously true that when merging multiple rowsets, a row's liveness always moved to the same or newer rowsets. I.e., if a row was deleted and inserted again, the new insert would either land in the same rowset from which it was deleted (e.g. if deleting from the MRS and inserting back into the same MRS), or a rowset whose rows are newer than the rowset being deleted from (e.g. if deleting from a DRS and inserting into the MRS). This invariant was upheld by various checks in compaction code. The invariant is no longer true when considering transactional inserts. Take the following example: - ts=10: insert 'a' to the tablet's MRS - ts=11: delete 'a' from the tablet's MRS - begin txn - insert 'a' to a transactional MRS - ts=12: commit the transactional MRS - ts=13: delete 'a' from the transactional MRS - ts=14: insert 'a' to the tablet's MRS In considering the row history for 'a' in the context of defining the row's history, we're left with the following row histories in each rowset, which serve as an input to our history-merging code: tablet MRS: UNDO(del@10) <- 'a' -> REDO(del@11) -> REDO(reins@14) txn MRS:UNDO(del@12) <- 'a' -> REDO(del@13) Previously, our invariant meant that one of these rows entire history (both UNDOs and REDOs) was entirely ahead of the other. This meant that merging was a matter of determining which input is older (which must be a deleted "ghost" row, since there can only be a single live row at a time), converting the older row and its history into UNDOs, and merging that UNDO history with the newer row's UNDOs. Given the above sequence, defining an older row isn't as straightforward, given the overlapping of the histories. This led to broken invariants in the form of CHECK failures or incorrect results. However, a notable obvservation is that there _is_ a discernable history that does abide by our previously held invariant, if we transfer the newer REDOs from the tablet's MRS input to the txn's MRS input: UNDO(del@10) <- 'a' -> REDO(del@11) UNDO(del@12) <- 'a' -> REDO(del@13) -> REDO(reins@14) This transferral is safe because we still expect that only a single instance of a row can be "live" at a time. I.e., if there is such overlap, it is caused by the deletion of a row from the older rowset, and in transferring the history, there is no risk of ending up with two histories for the same row that both end with a live row. This patch implements this transferral of history onto the newer input, and adds some fuzz test cases that helped snuff out the aforementioned CHECK failures and incorrect results. It also enables transactional ops in fuzz-itest, which helped find this issue. Without this patch, fuzz-itest with transactional support fails 2/10 times in DEBUG mode. With it, it has passed 1000/1000 times. Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Reviewed-on: http://gerrit.cloudera.org:8080/16752 Reviewed-by: Alexey Serbin Tested-by: Andrew Wong --- M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/mutation.h 5 files changed, 608 insertions(+), 81 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Andrew Wong: Verified -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 11 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16752 ) Change subject: tablet: allow interleaving of row liveness between compaction input rows .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 27 Feb 2021 00:14:26 + Gerrit-HasComments: No
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16752 ) Change subject: tablet: allow interleaving of row liveness between compaction input rows .. Patch Set 10: Unrelated test failure: TokenBasedConnectionITest.ReacquireAuthnToken -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 27 Feb 2021 00:14:22 + Gerrit-HasComments: No
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Andrew Wong has removed a vote on this change. Change subject: tablet: allow interleaving of row liveness between compaction input rows .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16752 ) Change subject: tablet: allow interleaving of row liveness between compaction input rows .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 26 Feb 2021 23:08:32 + Gerrit-HasComments: No
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16752 to look at the new patch set (#10). Change subject: tablet: allow interleaving of row liveness between compaction input rows .. tablet: allow interleaving of row liveness between compaction input rows It was previously true that when merging multiple rowsets, a row's liveness always moved to the same or newer rowsets. I.e., if a row was deleted and inserted again, the new insert would either land in the same rowset from which it was deleted (e.g. if deleting from the MRS and inserting back into the same MRS), or a rowset whose rows are newer than the rowset being deleted from (e.g. if deleting from a DRS and inserting into the MRS). This invariant was upheld by various checks in compaction code. The invariant is no longer true when considering transactional inserts. Take the following example: - ts=10: insert 'a' to the tablet's MRS - ts=11: delete 'a' from the tablet's MRS - begin txn - insert 'a' to a transactional MRS - ts=12: commit the transactional MRS - ts=13: delete 'a' from the transactional MRS - ts=14: insert 'a' to the tablet's MRS In considering the row history for 'a' in the context of defining the row's history, we're left with the following row histories in each rowset, which serve as an input to our history-merging code: tablet MRS: UNDO(del@10) <- 'a' -> REDO(del@11) -> REDO(reins@14) txn MRS:UNDO(del@12) <- 'a' -> REDO(del@13) Previously, our invariant meant that one of these rows entire history (both UNDOs and REDOs) was entirely ahead of the other. This meant that merging was a matter of determining which input is older (which must be a deleted "ghost" row, since there can only be a single live row at a time), converting the older row and its history into UNDOs, and merging that UNDO history with the newer row's UNDOs. Given the above sequence, defining an older row isn't as straightforward, given the overlapping of the histories. This led to broken invariants in the form of CHECK failures or incorrect results. However, a notable obvservation is that there _is_ a discernable history that does abide by our previously held invariant, if we transfer the newer REDOs from the tablet's MRS input to the txn's MRS input: UNDO(del@10) <- 'a' -> REDO(del@11) UNDO(del@12) <- 'a' -> REDO(del@13) -> REDO(reins@14) This transferral is safe because we still expect that only a single instance of a row can be "live" at a time. I.e., if there is such overlap, it is caused by the deletion of a row from the older rowset, and in transferring the history, there is no risk of ending up with two histories for the same row that both end with a live row. This patch implements this transferral of history onto the newer input, and adds some fuzz test cases that helped snuff out the aforementioned CHECK failures and incorrect results. It also enables transactional ops in fuzz-itest, which helped find this issue. Without this patch, fuzz-itest with transactional support fails 2/10 times in DEBUG mode. With it, it has passed 1000/1000 times. Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a --- M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/mutation.h 5 files changed, 608 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16752/10 -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. [client-test] more robust TestRetrieveAuthzTokenInParallel I saw a flakiness in the ClientTest.TestRetrieveAuthzTokenInParallel scenario, where about 25 out of 256 runs failed when running with --stress_cpu_threads=16: src/kudu/client/client-test.cc:6923: Failure Expected: (num_reqs) < (kThreads), actual: 21 vs 20 This patch addresses the issue by: * skipping the scenario when --stress_cpu_threads is not zero * skipping the scenario when running at single-core CPU machines * limiting the number of threads up to the number of CPU cores Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Reviewed-on: http://gerrit.cloudera.org:8080/17128 Tested-by: Alexey Serbin Reviewed-by: Hao Hao Reviewed-by: Andrew Wong --- M src/kudu/client/client-test.cc 1 file changed, 15 insertions(+), 3 deletions(-) Approvals: Alexey Serbin: Verified Hao Hao: Looks good to me, approved Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@15 PS3, Line 15: This is expected to be a good balance > That single process is itself not distributed in nature at that point and I Thank you for the explanation. Right: my concern was about distributing the IO load for requests issued from one client application. We also discussed this offline, and I agree that even with current implementation we didn't have any provision for better distribution of such IO load if looking at this from the point of using Kudu C++ client in impalad which is collocated with kudu-tserver, and impalad always worked with the local kudu-tserver. So, from that perspective we cannot talk about regressions anyways. http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191 PS3, Line 191: still benefit from spreading the load across replicas > I didn't have a good reason to suspect that per-client would necessarily be SGTM. We also discussed this offline. Given that Impala is most likely the major user of Kudu C++ client, it seems the risk of exposing some API that nobody uses is higher than finding some unexpected use case where the former affinity behavior would be a better fit than the newly implemented one. -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 22:11:52 + Gerrit-HasComments: Yes
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 ) Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@118 PS3, Line 118: assertEquals(0, quiesceTserver.waitFor()); : > why do we need to call waitFor() twice here? Done -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 22:10:42 + Gerrit-HasComments: Yes
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 22:10:29 + Gerrit-HasComments: No
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Hello Attila Bukor, Kudu Jenkins, Grant Henke, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17124 to look at the new patch set (#4). Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING Prior to this patch, if a tablet server were quiescing for a prolonged period, scan requests could time out, complaining that the tablet server is quiescing, but without ever retrying the scan at another tablet server. This is because tablet servers will return TABLET_NOT_RUNNING to clients when attempting a scan while quiescing. The behavior in the C++ client is that the location is then blacklisted and the request is retried elsewhere. The behavior in the Java client, though, is that the same location is retried until failure. This patch addresses this by treating TABLET_NOT_RUNNING errors in the Java client as we would for TABLET_NOT_FOUND, which is actually quite similar to the handling for TABLET_NOT_RUNNING in the C++ client: the location is invalidated for further attempts, and the request is retried elsewhere. Why not just have quiescing tablet servers return TABLET_NOT_FOUND, then? TABLET_NOT_FOUND errors in the C++ client actually have some behavior not present in the Java client: a tablet whose location is invalidated with TABLET_NOT_FOUND in the C++ client will be required to be looked up again, requiring a round trip to the master. This behavior doesn't exist in the Java client, so I thought it easiest to piggyback on TABLET_NOT_FOUND handling for now. Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 --- M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java 3 files changed, 56 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/4 -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run In testing an in-flight patch[1], it was found that our current handling of errors when participant registration hasn't completed can lead to incorrect behavior: the path in which we handle NOT_FOUND errors while committing expects that such errors only occur if the tablet has been deleted, and we currently proceed with the commit. The incorrect assumption here was that NOT_FOUND errors are only produced when the tablet has been deleted, which is not the case. This behavior will be amended in an upcoming patch[2], and we'll actually abort the transaction on NOT_FOUND errors. Until then, this patch adjust the behavior to return ILLEGAL_STATE instead of NOT_FOUND, so at least the error type is consistent with the rest of the participant op lifecycle errors. [1] https://gerrit.cloudera.org/c/17037/ [2] https://gerrit.cloudera.org/c/17022 Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Reviewed-on: http://gerrit.cloudera.org:8080/17127 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin Reviewed-by: Hao Hao --- M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.h 2 files changed, 3 insertions(+), 3 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved Hao Hao: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Thanks! This sounds like a reasonable plan, indeed. Yep, sounds good. I'll merge this and follow it up then. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 22:07:37 + Gerrit-HasComments: Yes
[kudu-CR] k8s: minor updates to the StatefulSet
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17098 ) Change subject: k8s: minor updates to the StatefulSet .. k8s: minor updates to the StatefulSet I walked through some of the Kubernetes README and noticed the ports were off. This updates to use the typical defaults we use for masters and tservers. It also updates to use 4 tservers, which is closer to what would be used in production, given 3-4-3 replication. Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611 Reviewed-on: http://gerrit.cloudera.org:8080/17098 Tested-by: Kudu Jenkins Reviewed-by: Hao Hao Reviewed-by: Alexey Serbin --- M kubernetes/README.adoc M kubernetes/kudu-services.yaml M kubernetes/kudu-statefulset.yaml 3 files changed, 31 insertions(+), 25 deletions(-) Approvals: Kudu Jenkins: Verified Hao Hao: Looks good to me, approved Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611 Gerrit-Change-Number: 17098 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] k8s: minor updates to the StatefulSet
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17098 ) Change subject: k8s: minor updates to the StatefulSet .. Patch Set 1: Code-Review+2 Thank you for addressing these! -- To view, visit http://gerrit.cloudera.org:8080/17098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611 Gerrit-Change-Number: 17098 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 21:58:05 + Gerrit-HasComments: No
[kudu-CR] test: add more natural test for KUDU-2233
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. test: add more natural test for KUDU-2233 I have a patch in-flight that touches an area of the code around where we expect the infamous KUDU-2233 crash. Before merging it, I'd like to ensure the graceful handling of this corruption is unaffected, especially in cases where data has previously been corrupted and we've just upgraded to a newer version of Kudu. This patch adds such a test case, where data is corrupted but does not result in a crash, and the tserver is restarted with bits that handle corruption gracefully. In doing so, this patch also adds an off switch to all of the guardrails we installed for KUDU-2233. Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Reviewed-on: http://gerrit.cloudera.org:8080/17114 Reviewed-by: Alexey Serbin Tested-by: Andrew Wong --- M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/integration-tests/timestamp_advancement-itest.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_replica.cc 7 files changed, 173 insertions(+), 22 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Andrew Wong: Verified -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] test: add more natural test for KUDU-2233
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. Patch Set 4: Verified+1 Unrelated failure of LocationAwareRebalancingParamTest.Rebalance/idx8_rf4_t8_l_3_1_1_1 -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 21:56:50 + Gerrit-HasComments: No
[kudu-CR] test: add more natural test for KUDU-2233
Andrew Wong has removed a vote on this change. Change subject: test: add more natural test for KUDU-2233 .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 3: Code-Review+1 (3 comments) Per the Slack discussion, would be good to get numbers making sure this improves things before merging if we can. http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@11 PS3, Line 11: ensure nit: ensures http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@12 PS3, Line 12: implimentation nit: implementation http://gerrit.cloudera.org:8080/#/c/17129/3/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java File java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java: http://gerrit.cloudera.org:8080/#/c/17129/3/java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java@66 PS3, Line 66: private static final int kRandomSelectionInt = new Random().nextInt(Integer.MAX_VALUE); nit: Google's Java styleguide suggests RANDOM_SELECTION_INT for constants. I suppose we're not that strict with following it though, given they suggest "logger" instead of "LOG" https://google.github.io/styleguide/javaguide.html#s5-naming -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 21:56:27 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Ah great question. In that case, I think the correct course of action would Thanks! This sounds like a reasonable plan, indeed. I guess we can keep this patch as is, and address the issue in a separate patch. Whatever you think is best choice here. Exactly: if the write request is retried and TxnOpDispatcher tries to register a participant and issue BEGIN_TXN for the participant again, it will fail at registering the participant. Then TxnOpDispatcher will respond with appropriate error back to client, so the client will know that its write failed. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 21:39:51 + Gerrit-HasComments: Yes
[kudu-CR] test: add more natural test for KUDU-2233
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. Patch Set 4: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77 PS2, Line 77: #ifndef NDEBUG > I am fine with the current code, but if Alexey thinks differentiate the fla Thank you! http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77 PS2, Line 77: #ifndef NDEBUG > Done Apart from performance concern (I guess most likely the performance difference is negligible), I was not sure whether we want to keep a flag which does nothing. It was not a major point anyways, so I'm also good with either keeping it or removing. -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 21:25:30 + Gerrit-HasComments: Yes
[kudu-CR] test: add more natural test for KUDU-2233
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77 PS2, Line 77: #ifndef NDEBUG > I am fine with the current code, but if Alexey thinks differentiate the fla Done -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:59:58 + Gerrit-HasComments: Yes
[kudu-CR] test: add more natural test for KUDU-2233
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17114 to look at the new patch set (#4). Change subject: test: add more natural test for KUDU-2233 .. test: add more natural test for KUDU-2233 I have a patch in-flight that touches an area of the code around where we expect the infamous KUDU-2233 crash. Before merging it, I'd like to ensure the graceful handling of this corruption is unaffected, especially in cases where data has previously been corrupted and we've just upgraded to a newer version of Kudu. This patch adds such a test case, where data is corrupted but does not result in a crash, and the tserver is restarted with bits that handle corruption gracefully. In doing so, this patch also adds an off switch to all of the guardrails we installed for KUDU-2233. Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa --- M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/integration-tests/timestamp_advancement-itest.cc M src/kudu/tablet/compaction.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_replica.cc 7 files changed, 173 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/17114/4 -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [client-test] more robust TestRetrieveAuthzTokenInParallel
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17128 ) Change subject: [client-test] more robust TestRetrieveAuthzTokenInParallel .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17128 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0afa5ae72ccb96a218b6c6ff026ad88a70dea4f7 Gerrit-Change-Number: 17128 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:28:16 + Gerrit-HasComments: No
[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 ) Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@118 PS3, Line 118: quiesceTserver.waitFor(); : assertEquals(0, quiesceTserver.waitFor() why do we need to call waitFor() twice here? -- To view, visit http://gerrit.cloudera.org:8080/17124 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1 Gerrit-Change-Number: 17124 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:25:15 + Gerrit-HasComments: Yes
[kudu-CR] k8s: minor updates to the StatefulSet
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17098 ) Change subject: k8s: minor updates to the StatefulSet .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b0084f70d30187bcca4e356e38c35b2486c611 Gerrit-Change-Number: 17098 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:12:13 + Gerrit-HasComments: No
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@15 PS3, Line 15: This is expected to be a good balance > How do we know? What if a single process does a lot of full table scans? That single process is itself not distributed in nature at that point and I would expect if it is doing multiple full scans of the same tablet it would be a good choice to pick the already warm tablet. IMPALA-10481 describes why our current choice is suboptimal and this change aims to address that concern. We do plan to performance test and validate this change Impala side as well. The one case I could see where a single process might want to use multiple tablet replicas is the case where the tablet splitting feature is used on the same process, though there hasn't been a lot of performance evaluation of that yet. IMPALA-9792 is still open tracking that evaluation from the Impala side (Which is the only known usage today). http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191 PS3, Line 191: still benefit from spreading the load across replicas > I meant we already have a recommendation to have just a single client per a I didn't have a good reason to suspect that per-client would necessarily be a better choice than per process. Especially considering we often see a single client per process anyway. The benefit of per process is that it matches the current java behavior and it means that multiple clients can be used without impacting the selection affinity on a given process. I could see a world where both is beneficial but didn't want to overcomplicate the change. I think the "ultimate" fix would be to allow to user to provide their own "seed" for replica selection randomness. I think that could be a useful follow on change to track with a jira. -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:09:00 + Gerrit-HasComments: Yes
[kudu-CR] test: add more natural test for KUDU-2233
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77 PS2, Line 77: DEFINE_bool(dcheck_on_kudu_2233_invariants, true > Yes, this one different as I can see: it's not quite used in NDEBUG case, w I am fine with the current code, but if Alexey thinks differentiate the flag to be not exposed in NDEBUG case can help with the performance of execution. Then I am also good with that. -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:03:04 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Right: I meant the case when a participant is registered, but BEGIN_TXN has Ah great question. In that case, I think the correct course of action would be to treat the non-existent transaction on the participant as OK, since the end result is still that the transaction doesn't exist once the ABORT_TXN returns, and I can update this patch and my following to reflect that. Then, the question is what happens if the write is retried after that happens, and the answer there is that the new dispatcher's call to RegisterParticipant would fail. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:02:43 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 19:56:59 + Gerrit-HasComments: No
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191 PS3, Line 191: still benefit from spreading the load across replicas > Why not to have this per client? I think that making it the same for all c I meant we already have a recommendation to have just a single client per application. So, if for some reason it it's necessary to escape from this affinity in the same application, keeping this per-process makes it impossible. However, if this were an affinity per client instance, then in case if this affinity is not desired, at least there is a choice to instantiate multiple clients. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 19:55:47 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17129/3//COMMIT_MSG@15 PS3, Line 15: This is expected to be a good balance How do we know? What if a single process does a lot of full table scans? http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/3/src/kudu/client/client-internal.cc@191 PS3, Line 191: still benefit from spreading the load across replicas Why not to have this per client? I think that making it the same for all clients in the same process is rather a restriction. -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 19:48:26 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@126 PS2, Line 126: )); > nit: drop Done http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134 PS2, Line 134: while (1) { > Better to move this close where it's first used in SelectTServer() function Done http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134 PS2, Line 134: { > nit: kNamingConvention? per https://google.github.io/styleguide/cppguide.ht Done http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134 PS2, Line 134: whil > nit: could also be const Done http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2614 PS2, Line 2614: // Marking stale forces a lookup. > nit: maybe comment that marking staleness forces a lookup? Done http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2637 PS2, Line 2637: RemoteT > Should be c2? Done -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 19:39:56 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17129 to look at the new patch set (#3). Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. KUDU-3248: Match C++ replica selection behavior of Java client The C++ client currently uses a new random value every time a replica is selected. Alternatively, the Java client uses a static value that ensure the selection for a single process or application remains deterministic. This patch adjusts the C++ implimentation to match the Java client behavior. This is expected to be a good balance of efficient use of cache memory and distribution of load across the replicas given a single process will always get the same choice resulting in cache hits for follow up scans, but seperate processes will get a potentially different random selection. The reviews on the Java patch here have some good context and discussion: https://gerrit.cloudera.org/#/c/12158/ Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 --- M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java M src/kudu/client/client-internal.cc M src/kudu/client/client-test.cc M src/kudu/client/client.h M src/kudu/common/common.proto 5 files changed, 86 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/17129/3 -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@126 PS2, Line 126: providing nit: drop http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134 PS2, Line 134: static nit: could also be const http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134 PS2, Line 134: random_selection_int nit: kNamingConvention? per https://google.github.io/styleguide/cppguide.html#Constant_Names http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2614 PS2, Line 2614: rt->MarkStale(); nit: maybe comment that marking staleness forces a lookup? http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-test.cc@2637 PS2, Line 2637: client_ Should be c2? -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 19:18:23 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/2/src/kudu/client/client-internal.cc@134 PS2, Line 134: static int random_selection_int = InitRandomSelectionInt(); Better to move this close where it's first used in SelectTServer() function. Also can further qualify with const. -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 19:10:58 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17129 to look at the new patch set (#2). Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. KUDU-3248: Match C++ replica selection behavior of Java client The C++ client currently uses a new random value every time a replica is selected. Alternatively, the Java client uses a static value that ensure the selection for a single process or application remains deterministic. This patch adjusts the C++ implimentation to match the Java client behavior. This is expected to be a good balance of efficient use of cache memory and distribution of load across the replicas given a single process will always get the same choice resulting in cache hits for follow up scans, but seperate processes will get a potentially different random selection. The reviews on the Java patch here have some good context and discussion: https://gerrit.cloudera.org/#/c/12158/ Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 --- M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java M src/kudu/client/client-internal.cc M src/kudu/client/client-test.cc M src/kudu/client/client.h M src/kudu/common/common.proto 5 files changed, 84 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/17129/2 -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@124 PS1, Line 124: GoogleOnceType once; : // This random integer is used when making any random choice for replica : // selection. It is static to provide a deterministic selection for any given : // process and therefore also providing better cache affinity while ensuring : // that we can still benefit from spreading the load across replicas for : // other processes and applications. : int random_selection_int; : : void InitRandomSelectionInt() { : std::random_device rdev; : std::mt19937 gen(rdev()); : random_selection_int = gen(); : } : : } > Since this is client-internal.cc, C++11 and above is permitted so can use s I will use static initialization as mentioned below. http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@270 PS1, Line 270: if (!local.empty()) { : ret = local[random_selection_int % local.size()]; : } else if (!same_location.empty()) { : ret = same_location[random_selection_int % same_location.size()]; : } else if (!filtered.empty()) { : ret = filtered[random_selection_int % filtered.size()]; : } > Considering random_selection_int is used only in this function, std::once i Awesome! I wasn't totally clear on that behavior given my lack of C++ expertise and wen't with what I saw other locations in the code base using. I am glad I can switch to something more straight forward. -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 18:53:30 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17129 ) Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@124 PS1, Line 124: GoogleOnceType once; : // This random integer is used when making any random choice for replica : // selection. It is static to provide a deterministic selection for any given : // process and therefore also providing better cache affinity while ensuring : // that we can still benefit from spreading the load across replicas for : // other processes and applications. : int random_selection_int; : : void InitRandomSelectionInt() { : std::random_device rdev; : std::mt19937 gen(rdev()); : random_selection_int = gen(); : } : : } Since this is client-internal.cc, C++11 and above is permitted so can use std::once instead. Already see unique_ptr and shared_ptr being used in this file. http://gerrit.cloudera.org:8080/#/c/17129/1/src/kudu/client/client-internal.cc@270 PS1, Line 270: if (!local.empty()) { : ret = local[random_selection_int % local.size()]; : } else if (!same_location.empty()) { : ret = same_location[random_selection_int % same_location.size()]; : } else if (!filtered.empty()) { : ret = filtered[random_selection_int % filtered.size()]; : } Considering random_selection_int is used only in this function, std::once is not necessary as static initialization is done just once and thread safe from C++11 onwards. static int random_selection_int = InitRandomSelectionInt(); -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 18:45:54 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 3: (2 comments) Not a review. Just passing by... http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h File src/kudu/tablet/lock_manager.h: http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h@19 PS3, Line 19: #include Nit: Can use the C++ compatible instead. http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h@157 PS3, Line 157: PartitionLock I find it weird that a class named *Lock doesn't have any lock/unlock/release methods associated with it explicitly or implicitly and doesn't inherit or compose a lock primitive. This might as well be a typedef to TxnId. -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 26 Feb 2021 18:23:07 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 3: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/17097/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17097/3//COMMIT_MSG@12 PS3, Line 12: hold nit: acquire? http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager-test.cc File src/kudu/tablet/lock_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager-test.cc@195 PS3, Line 195: threads.emplace_back([&] { > warning: 'emplace_back' is called inside a loop; consider pre-allocating th nit: since lock acquisition is quite fast operation, maybe it's worth synchronizing the start of the threads' activity by a barrier? http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager-test.cc@208 PS3, Line 208: TEST_F(LockManagerTest, TestConcurrentLockUnlockPartitionWithInvalidID) { nit: since lock acquisition is quite fast operation, maybe it's worth synchronizing the start of the threads' activity by a barrier? -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 26 Feb 2021 18:07:44 + Gerrit-HasComments: Yes
[kudu-CR] tablet: allow interleaving of row liveness between compaction input rows
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16752 ) Change subject: tablet: allow interleaving of row liveness between compaction input rows .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16752 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I042a7d70d32edf9d2a3a077790821893f162880a Gerrit-Change-Number: 16752 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 26 Feb 2021 16:56:41 + Gerrit-HasComments: No
[kudu-CR] test: add more natural test for KUDU-2233
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17114 ) Change subject: test: add more natural test for KUDU-2233 .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: http://gerrit.cloudera.org:8080/#/c/17114/2/src/kudu/tablet/compaction.cc@77 PS2, Line 77: DEFINE_bool(dcheck_on_kudu_2233_invariants, true > I can surround this with some ifndef sure, but we do have several test-only Yes, this one different as I can see: it's not quite used in NDEBUG case, while other test-only flags are used in both debug and NDEBUG case. At lines 308-311 and 316-319 the usage of this flag reduces to an empty if() clause in NDEBUG mode. Why to keep such a flag then in NDEBUG case? -- To view, visit http://gerrit.cloudera.org:8080/17114 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icac3ad0a1b6bb9b17d5b6a901dc5bba79c0840fa Gerrit-Change-Number: 17114 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 16:56:12 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3248: Match C++ replica selection behavior of Java client
Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17129 Change subject: KUDU-3248: Match C++ replica selection behavior of Java client .. KUDU-3248: Match C++ replica selection behavior of Java client The C++ client currently uses a new random value every time a replica is selected. Alternatively, the Java client uses a static value that ensure the selection for a single process or application remains deterministic. This patch adjusts the C++ implimentation to match the Java client behavior. This is expected to be a good balance of efficient use of cache memory and distribution of load across the replicas given a single process will always get the same choice resulting in cache hits for follow up scans, but seperate processes will get a potentially different random selection. The reviews on the Java patch here have some good context and discussion: https://gerrit.cloudera.org/#/c/12158/ Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 --- M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java M src/kudu/client/client-internal.cc M src/kudu/client/client-test.cc M src/kudu/client/client.h M src/kudu/common/common.proto 5 files changed, 87 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/17129/1 -- To view, visit http://gerrit.cloudera.org:8080/17129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaa55e88b4a222fabfaa7fa521c24482cc6816b04 Gerrit-Change-Number: 17129 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke