[kudu-CR] KUDU-2612: background task to commit transaction
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 16: Code-Review+2 (3 comments) Looks good to me. I didn't get a chance to check the tests closely. But I believe Alexey has done so. Thanks a lot Andrew for the patch! http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@257 PS14, Line 257: participant_ids_[participant_idx], : std::move(op_pb), : MonoDelta::FromMilliseconds(FLAGS_txn_background_rpc_timeout_ms), : std::move(participated_cb), : &begin_commit_timestamps_[participant_idx]); : } > It's not identical everywhere, but sure. Ack http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@268 PS14, Line 268: pated_cb = [this, scoped_this = std::move(scoped_this), : participant_idx, commit_timestamp] (const Status& s) { : if (IsShuttingDownCleanupIfLastOp()) { : > Yeah, added a comment to make that clear that it's intentional. Ack http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@787 PS14, Line 787: : std::lock_guar > There is always at least the BEGIN_COMMIT op timestamp, regardless of any w Ack -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 16 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 05 Feb 2021 08:12:29 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 16: (1 comment) Ah oops, clicked submit without another round from Hao. I can address any further feedback in follow-up patches. http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@205 PS7, Line 205: CK_LT(participant_idx, participant_ids_.size()); : // Status callback ca > nit: should this comment be moved to L201 to cover 's.IsNotFound()' case? Done -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 16 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 05 Feb 2021 07:21:37 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Reviewed-on: http://gerrit.cloudera.org:8080/16952 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,366 insertions(+), 113 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 16 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 15: Code-Review+2 (5 comments) Looks good to me! Maybe, Hao has more feedback? http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc File src/kudu/integration-tests/txn_commit-itest.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302 PS7, Line 302: , TestCommi > Done Thank you! http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690 PS7, Line 690: shared_ptr Yeah, calling Shutdown() from the mini server helps exercise any potential Thanks! http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754 PS7, Line 754: // TxnStatusManager. > That case is very similar to the TestCommitWhenParticipantsAreDown case, th Sure -- it seems ExternalMiniCluster-based harness fits better for that purpose. Also, if you are sure the control paths and everything else is the same, probably having just TestCommitWhenParticipantsAreDown should be good enough. http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201 PS7, Line 201: return false; : } : > We discussed this further with Hao and agreed that it's worth exploring an Thank you! http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201 PS7, Line 201: return false; : } : > The case we're thinking about here is if a transaction is racing with a dro Dropping a partition after commit is complete is a bit different because at least at the commit timestamp all the written data was persisted in the database. However, since right now DDL operations doesn't participate in MVCC story in Kudu, there isn't any difference -- I agree. -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 15 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 05 Feb 2021 06:55:47 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 15: (8 comments) http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@92 PS14, Line 92: BeginCommitAsyn > nit: maybe name it BeginCommitAsyncTask if it is asynchronously as well. Sa Done http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@102 PS14, Line 102: commit_timestamp > nit: can you comment on how the timestamp will be used? Done http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@321 PS14, Line 321: std::lock_guard l(lock_); > nit: can you update the comment to add that we will also load commits in fl Done http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@123 PS14, Line 123: > nit: space. Done http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@215 PS14, Line 215: urn; : } > nit: remove? Done http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@257 PS14, Line 257: participant_ids_[participant_idx], : std::move(op_pb), : MonoDelta::FromMilliseconds(FLAGS_txn_background_rpc_timeout_ms), : std::move(participated_cb), : &begin_commit_timestamps_[participant_idx]); : } > nit: do you think this block of code worth writing to a method of own? As I It's not identical everywhere, but sure. http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@268 PS14, Line 268: pated_cb = [this, scoped_this = std::move(scoped_this), : participant_idx, commit_timestamp] (const Status& s) { : if (IsShuttingDownCleanupIfLastOp()) { : > So we proceed to ScheduleFinalizeCommitWrite even we encounter error here? Yeah, added a comment to make that clear that it's intentional. http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@787 PS14, Line 787: : std::lock_guar > This happens when there is no ops in the transaction? What does invalid tim There is always at least the BEGIN_COMMIT op timestamp, regardless of any write ops. Invalid timestamp is just a dummy timestamp. -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 15 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 05 Feb 2021 05:28:33 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#15). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,366 insertions(+), 113 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/15 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 15 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 14: (9 comments) http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@92 PS14, Line 92: BeginCommitTask nit: maybe name it BeginCommitAsyncTask if it is asynchronously as well. Same for below. http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@102 PS14, Line 102: commit_timestamp nit: can you comment on how the timestamp will be used? http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@321 PS14, Line 321: // Loads the contents of the transaction status tablet into memory. nit: can you update the comment to add that we will also load commits in flight and resume these commits? http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@205 PS7, Line 205: return; : } nit: should this comment be moved to L201 to cover 's.IsNotFound()' case? http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@123 PS14, Line 123: " nit: space. http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@215 PS14, Line 215: If the participant has been deleted (it's not found), proceed as if it : // were committed. nit: remove? http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@257 PS14, Line 257: if (stop_task_ || txn_status_manager_->shutting_down()) { : if (--ops_in_flight_ == 0) { : txn_status_manager_->RemoveCommitTask(this->txn_id_.value(), this); : } : return; : } nit: do you think this block of code worth writing to a method of own? As I see it is repeating serval places? http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@268 PS14, Line 268: (PREDICT_FALSE(!s.ok())) { : LOG(WARNING) << Substitute("Participant $0 FINALIZE_COMMIT op returned $1", : participant_ids_[participant_idx], s.ToString()); : } So we proceed to ScheduleFinalizeCommitWrite even we encounter error here? http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@787 PS14, Line 787: // If there are no participants for this transaction; just write an invalid : // timestamp This happens when there is no ops in the transaction? What does invalid timestamp mean in this case? -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 14 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 05 Feb 2021 01:35:35 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#14). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,331 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/14 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 14 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#13). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,329 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/13 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 13 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#12). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,328 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/12 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 12 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#11). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,326 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/11 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 11 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#10). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,325 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/10 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#9). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. Some follow-ups to expect: - This isn't as robust to failures as an approach that writes an intermediate state to the TxnStatusManager in between steps 3 and 4. A follow-up patch will implement that. - A separate patch will implement aborting transactions. - I disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,324 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/9 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201 PS7, Line 201: if (PREDICT_FALSE(s.IsTimedOut())) { : // Retry timeout errors. Other transient errors should be retried by the : // client until timeout. > The case we're thinking about here is if a transaction is racing with a dro We discussed this further with Hao and agreed that it's worth exploring an implementation in which we write an intermediate record to the TSM before starting to send out the FINALIZE_COMMIT ops. This would allow us to be more robust against unforeseen issues that might require us to abort post-commit-call. -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 02 Feb 2021 21:58:19 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 8: (11 comments) http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc File src/kudu/integration-tests/txn_commit-itest.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302 PS7, Line 302: DeleteTable > Does it work as expected as well if dropping a tablet (just a part of a tab Done http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@316 PS7, Line 316: TestCommitAfterDroppingRangeP > I think it would be great to add a scenario when TxnStatusManager instance Yeah, I can add that as a follow-up if you dont mind it being in a separate patch. Pausing is tricky with my current test harnesses, so I'd like to add some additional coverage using an EMC, per your other suggestions. http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@565 PS7, Line 565: for (const auto& s : results) { : EXPECT_OK(s); : } > Is this going to put the txn keepalive thread to sleep as well? I'm not su Ah indeed. Done http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690 PS7, Line 690: // Shut down our par > I guess a real 'abrupt' way of terminating the process is sending SIGKILL : Yeah, calling Shutdown() from the mini server helps exercise any potential races in the destructor, but SIGKILL would be more organic. I'll add that as a follow-up change. http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754 PS7, Line 754: } > It would be great to add at least one failure scenario, when finalizing the That case is very similar to the TestCommitWhenParticipantsAreDown case, though less trivial to implement with 1, 2, and 3 node clusters. I can add that as a follow-up. http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152 PS4, Line 152: > Ah, indeed. It seems I misunderstood where the status comes from (even if Added a comment http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201 PS7, Line 201: if (PREDICT_FALSE(s.IsTimedOut())) { : // Retry timeout errors. Other transient errors should be retried by the : // client until timeout. > After some consideration, I'm not quite sure this 'good-riddance' semantic The case we're thinking about here is if a transaction is racing with a drop partition. The observation I'm relying on here is that the result of deleting the partition after committing and deleting the partition before committing are the same -- the end result is the partition is deleted with no way of reading the data. And if the table were deleted while inserting data, it's the onus of the writing application to detect that, since there'd be NotFound errors well before the commit phase. It's also important to pass through this case rather than aborting for the following case: 1. begin committing a transaction with participants 1..5, running the BEGIN_COMMIT op on all participants 2. finalize commit on participants 1..5, running the FINALIZE_COMMIT op on all participants 3. leadership changes or the node crashes, a new leader is elected and the commit task is retried 4. partition 5 is deleted 5. we notice that partition 5 is deleted. Should we abort the transaction? But we've already finalized the commit on some participants! I added a comment trying to explain that. http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@286 PS7, Line 286: txn_client_->ParticipateInTransactionAsync( : participant_ids_[participant_idx], > nit: just for the sake of working with std::atomic in a consistent way, may Done http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@463 PS7, Line 463: > Should this be a parameter controlled by a run-time flag? Done http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@471 PS7, Line 471: now, just > nit: maybe, move this outside of the 'for ()' cycle? Done http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@492 PS7, Line 492: { : std::lock_guard l(lock_); : highest_txn_id_ = std::max(highest_txn_id, highest_txn_id_); : > Why not to do so prior to starting new commit tasks? I g
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#8). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. A separate patch will implement aborting transactions. I also disabled the background tasks in some tests that assume state changes are entirely controlled by clients. A follow-up change will address these to account for the state changes more organically. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 18 files changed, 1,323 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/8 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 7: Code-Review+1 (11 comments) http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc File src/kudu/integration-tests/txn_commit-itest.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302 PS7, Line 302: DeleteTable Does it work as expected as well if dropping a tablet (just a part of a table), say, dropping one range of a range-partitioned table? Is it worth adding a test scenario for such case? http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@316 PS7, Line 316: TestRestartingWhileCommitting I think it would be great to add a scenario when TxnStatusManager instance is frozen/unfrozen (i.e. sent SIGSTOP/SIGCONT) during the commit phase. That should include a case when the leadership is transferred to another instance because of the unresponsive process hosting the former TxnStatusManager leader replica, but once the commit finalization is picked up by the new leader, the former leader is back and tries to continue its the commit phase finalization routine. It guess it might be a good test to verify the robustness of the commit phase and re-enterability of corresponding methods. What do you think? http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@565 PS7, Line 565: : // Wait a bit to allow would-be background aborts to happen. : SleepFor(MonoDelta::FromSeconds(1)); Is this going to put the txn keepalive thread to sleep as well? I'm not sure. Instead, why not to move the 'txn' handle into a sub-scope, and serialize it into a string which is stored in the top-level scope. Then, when the original 'txn' scope goes out of scope, it's possible to de-serialize the handle and use the de-serialized handle for calls like KuduTransaction::IsCommitComplete() and alike. What do you think? http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690 PS7, Line 690: tsm_ts_->Shutdown(); I guess a real 'abrupt' way of terminating the process is sending SIGKILL :) Do you think it's worth adding a scenario like that in this test suite? Doing that in a separate change list might be the best option, I guess. http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754 PS7, Line 754: } It would be great to add at least one failure scenario, when finalizing the commit phase fails to verify that TxnManager doesn't crash and the transaction fails to finalize: * one participant responses with an error to its BEGIN_COMMIT request (e.g., only one replica out of 3 is running -- BTW, is it going to turn into Status::TimedOut() seen at the TxnStatusManager's side or that would be something else?) * sending the commit timestamp from a participant fails (again, because of only one replica out of 3 is running) http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152 PS4, Line 152: > I don't think so, since I'm pretty sure the client automatically retries Se Ah, indeed. It seems I misunderstood where the status comes from (even if you added a comment :) ). Thank you for the clarification! http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201 PS7, Line 201: if (PREDICT_FALSE(s.IsNotFound())) { : LOG(INFO) << Substitute("Participant $0 was not found: $1", : participant_ids_[participant_idx], s.ToString()); After some consideration, I'm not quite sure this 'good-riddance' semantic is a good fit for all cases. E.g., what if some batch of data is being inserted, and a tablet (or a whole table) is dropped unintentionally during that process? From the user perspective (who isn't aware of the dropped tablet/table) that should be certainly an error -- the data that they thought is reliably persisted after the commit has successfully finalized is actually gone (what a surprise, isn't it?). Should we instead fail a transaction is such a case? Maybe, at least add a flag to make this behavior configurable, where by default a transaction fails to commit if one of its actors with some added data disappeared during the commit phase? http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@286 PS7, Line 286: DCHECK_EQ(0, ops_in_flight_); : o
[kudu-CR] KUDU-2612: background task to commit transaction
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152 PS4, Line 152: > Does it make sense to retry Status::ServiceUnavailable() as well? I don't think so, since I'm pretty sure the client automatically retries ServiceUnavailable errors until the timeout. Unless there's a specific code path you're referring to? -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 02 Feb 2021 01:13:47 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#7). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. A separate patch will implement aborting transactions. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 17 files changed, 1,264 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/7 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#6). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. A separate patch will implement aborting transactions. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_manager-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 17 files changed, 1,263 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/6 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: background task to commit transaction
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@146 PS4, Line 146: (--ops_in_flight_ == 0) > I'm not sure it's safe even if ops_in_flight_ is an atomic. What happens i Ah, operator--() is actually fetch_sub(1)-1. In such case, it should be safe. Then the only concern is not to decrement it past 0, but it seems it cannot be the case in this code. -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 01 Feb 2021 09:04:11 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@146 PS4, Line 146: (--ops_in_flight_ == 0) I'm not sure it's safe even if ops_in_flight_ is an atomic. What happens if two threads do this simultaneously when the value of 'ops_in_flight_' was 2? As I understand, atomic guarantees atomic read and write, but I'm not sure about the serialization of the access. Should it be compare_exchange_strong() instead? http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152 PS4, Line 152: Retry timeout errors. Does it make sense to retry Status::ServiceUnavailable() as well? -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 01 Feb 2021 08:43:29 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: background task to commit transaction
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 ) Change subject: KUDU-2612: background task to commit transaction .. Patch Set 5: (20 comments) http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7200 PS4, Line 7200: > nit here and below where changed from 'false' --> 'true': consider not spec Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7201 PS4, Line 7201: shared_ptr txn; : ASSERT_OK(client_->NewTransaction(&txn)); : ASSERT_OK(txn->Commit()); : bool is_complete = false; > nit: I guess this should be removed once the scenario updated to -- I guess Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7217 PS4, Line 7217: > nit: maybe, set it to 'false' to be complete sure that IsCommitComplete() w Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc File src/kudu/integration-tests/txn_commit-itest.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@222 PS4, Line 222: KuduScanner scanner(table.get()); : RETURN_NOT_OK(scanner.Open()); : int rows = 0; > Actually, there is way of doing this without exposing transaction identifie Ah, thanks for the pointer! Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@283 PS4, Line 283: kN > nit: consider introducing a constant and using it here and below Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@291 PS4, Line 291: itComplete(& > nit: if it makes sense, consider omitting the 'wait' parameter since here s Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@316 PS4, Line 316: shared_ptr txn; : shared_ptr txn_session; : ASSERT_OK(BeginTransaction(participant_ids_, &txn, &txn_session)); : FLAGS_txn_status_manager_inject_latency_fina > nit: how long does it take to complete? If more than 3 seconds, consider d Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@663 PS4, Line 663: ASSERT_TRUE(completion_status.IsIncomplete()) << completion_status.ToString(); > nit: maybe, add ' << completion_status.ToString()' to diagnose a failure (i Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@680 PS4, Line 680: > Just in case I didn't miss it among those scenarios: it would be great to a Done http://gerrit.cloudera.org:8080/#/c/16952/1/src/kudu/transactions/txn_status_entry.cc File src/kudu/transactions/txn_status_entry.cc: http://gerrit.cloudera.org:8080/#/c/16952/1/src/kudu/transactions/txn_status_entry.cc@a45 PS1, Line 45: > This condition no longer hold? No, e.g. when reloading the tablet after committing. Seems our tests didn't cover those codepaths previously. http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.h@80 PS4, Line 80: > nit: it would be nice to document the parameter and give it a bit more mean Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@137 PS4, Line 137: using kudu::pb_util::SecureShortDebugString; > nit: add a DCHECK() for the index 'i' in 'participant_ids_'? Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@208 PS4, Line 208: such errors wou > nit: if it makes sense, unify using 'finalize commit' and 'FINALIZE_COMMIT' Done http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@253 PS4, Line 253: if (PR > What if the commit_pool_ is being shut down? Is it supposed to crash? Nice catch. I've updated the shutdown logic such that we always finish the commit tasks before shutting down the threadpool. http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_system_client.cc File src/kudu/transactions/txn_system_client.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_system_client.cc@449 PS4, Line 449: Status s; : do { : s = GetClient(client); : if (PREDICT_TRUE(s.ok())) { : DCHECK(*client); : return Status::OK(); : } : SleepFor(MonoDelta::FromMilliseconds(100)); : }
[kudu-CR] KUDU-2612: background task to commit transaction
Hello Alexey Serbin, Kudu Jenkins, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16952 to look at the new patch set (#5). Change subject: KUDU-2612: background task to commit transaction .. KUDU-2612: background task to commit transaction This patch introduces background tasks that get run when KuduTransaction::Commit() is called. The typical workflow is as follows: 1. Commit() is called, resulting in a BeginCommitTransaction() call on the TxnStatusManager. 2. An update is made to the transaction status table, marking the transaction's state as COMMIT_IN_PROGRESS. 3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent asynchronously to every participant of the transaction. 4. Once all responses are received from the participants, a commit timestamp is determined, and FINALIZE_COMMIT ops are sent asynchronously to every participant. 5. Once all responses are received from the participants, an update is made to the transaction status table, marking the transaction's state as COMMITTED. There are some nuances here around error handling. Namely, what do we do if there are errors in sending the above requests? Well, it depends on the error. Transient errors (i.e. timeouts) are simply retried. More permanent errors need a bit more thought though: - If a participant has been deleted, what do we do? This patch makes a best effort attempt to abort the transaction if so. - Any other kinds of errors (e.g. illegal state errors from a participant) aren't expected in normal operation of a cluster. For this, we stop the commit task and log a warning. Hopefully an operator can intervene. A separate patch will implement aborting transactions. Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e --- M src/kudu/client/client-test.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/txn_commit-itest.cc M src/kudu/integration-tests/txn_status_table-itest.cc M src/kudu/master/txn_manager-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/txn_coordinator.h M src/kudu/transactions/transactions.proto M src/kudu/transactions/txn_status_entry.cc M src/kudu/transactions/txn_status_manager-test.cc M src/kudu/transactions/txn_status_manager.cc M src/kudu/transactions/txn_status_manager.h M src/kudu/transactions/txn_system_client.cc M src/kudu/transactions/txn_system_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 16 files changed, 1,258 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/5 -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)