Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16952 )
Change subject: wip KUDU-2612: background task to commit transaction ...................................................................... Patch Set 4: (16 comments) Overall looks pretty good after a quick initial look. I haven't dug into the logic of the orchestration tasks -- I'm planning to do that over the coming weekend. http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7200 PS4, Line 7200: true /* wait */ nit here and below where changed from 'false' --> 'true': consider not specifying the 'wait' parameter, since it's 'true' by default http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7201 PS4, Line 7201: // TODO(aserbin): when txn lifecycle is properly implemented, inject a : // delay into the txn finalizing code to make sure : // the transaction stays in the COMMIT_IN_PROGRESS state : // for a while nit: I guess this should be removed once the scenario updated to -- I guess it's doesn't make much sense to fuss around artificial delays once the real transaction orchestration is ready. http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7217 PS4, Line 7217: true nit: maybe, set it to 'false' to be complete sure that IsCommitComplete() works as expected http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc File src/kudu/integration-tests/txn_commit-itest.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@222 PS4, Line 222: // TODO(awong): expose transaction ID as public API? Might be useful for : // debugging and general observability. : int64_t txn_id = cur_txn_id_++; Actually, there is way of doing this without exposing transaction identifiers as part of public API -- it's a undocumented feature and it relies on implementation specific details, but it's fine to be used in tests. The idea is to serialize the transaction token and then peek into the TxnTokenPB::txn_id, assuming the serialized token string is just a serialized TxnTokenPB instance (it's so at this point). You can find an example of this trick at https://github.com/apache/kudu/blob/948f92e787136f12dbfb3e7195fef24db0be0088/src/kudu/client/client-test.cc#L441-L448 http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@283 PS4, Line 283: 10 nit: consider introducing a constant and using it here and below http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@291 PS4, Line 291: /*wait*/true nit: if it makes sense, consider omitting the 'wait' parameter since here since it's 'true' by default http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@316 PS4, Line 316: Status completion_status; : bool is_complete; : Status s = txn->IsCommitComplete(&is_complete, &completion_status); : ASSERT_TRUE(s.IsTimedOut()) << s.ToString(); nit: how long does it take to complete? If more than 3 seconds, consider declaring this scenario as 'slow', requiring KUDU_ALLOW_SLOW_TESTS=1 to run http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@663 PS4, Line 663: ASSERT_TRUE(completion_status.IsIncomplete()); nit: maybe, add ' << completion_status.ToString()' to diagnose a failure (if any ever occurs)? http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@680 PS4, Line 680: Just in case I didn't miss it among those scenarios: it would be great to add a scenario to make sure the client doesn't need to send keepalive messages for a transaction once it initiated committing it. I guess the scenario would be something like: * delay all those commit orchestration tasks * start a transaction, resulting in a 'txn' handle * call txn->Commit(false) * remove the txn out of the scope (so it starts automatic heartbeating) * make sure the transaction isn't aborted automatically but rather committed after the delayed orchestration tasks are finally run http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.h@80 PS4, Line 80: int i nit: it would be nice to document the parameter and give it a bit more meaningful name http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@137 PS4, Line 137: // Status callback called with the result from the participant op. This is nit: add a DCHECK() for the index 'i' in 'participant_ids_'? http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@208 PS4, Line 208: finalize commit nit: if it makes sense, unify using 'finalize commit' and 'FINALIZE_COMMIT' in the log messages http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@253 PS4, Line 253: CHECK_OK What if the commit_pool_ is being shut down? Is it supposed to crash? http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_system_client.cc File src/kudu/transactions/txn_system_client.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_system_client.cc@449 PS4, Line 449: Status s; : while (MonoTime::Now() < deadline) { : s = GetClient(client); : if (PREDICT_TRUE(s.ok())) { : DCHECK(*client); : return Status::OK(); : } : SleepFor(MonoDelta::FromMilliseconds(100)); : } nit: probably, rewriting this using do { ... } while () might be a slightly better approach it's expected to have the client initialized most of the times -- that way there wouldn't be an extra call to MonoTime::Now() http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/tserver/tablet_server.cc File src/kudu/tserver/tablet_server.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/tserver/tablet_server.cc@48 PS4, Line 48: using kudu::transactions::TxnSystemClient; nit: is this really needed here? http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/tserver/ts_tablet_manager.cc@1275 PS4, Line 1275: // Now that our TxnStatusManagers have shut down, clean up out threadpool. : txn_commit_pool_->Shutdown(); Is this the second time calling shutdown on the same pool? I'd expect that calling this at line 1258 should be enough already. -- To view, visit http://gerrit.cloudera.org:8080/16952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e Gerrit-Change-Number: 16952 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Sat, 30 Jan 2021 04:49:00 +0000 Gerrit-HasComments: Yes