Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17037 )
Change subject: KUDU-2612 tablet servers automatically register txn participants ...................................................................... Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/client/client-test.cc@7694 PS6, Line 7694: } > Maybe count the rows to verify the state? Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/ops/participant_op.cc@197 PS6, Line 197: RETURN_NOT_OK(tablet_replica_->UnregisterTxnOpDispatcher(txn_id())); > So begin commit can time out if the there are still writes pending when the Right: if there are pending operations in the context of a particular transaction, UnregisterTxnOpDispatcher() returns ServiceUnavailable(). It seems reasonable to me: how it's possible to start committing a transaction when there are still non-applied write operations? http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@155 PS6, Line 155: txn_participant_registrant > nit: doc this? Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@176 PS6, Line 176: void SubmitPendingTxnOps(int64_t txn_id); : void CancelPendingTxnOps(int64_t txn_id, const Status& s); > nit: would you mind doc these two? Thanks! Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@405 PS6, Line 405: // > nit: spurious line Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@456 PS6, Line 456: ready_, ops_queue_. > nit: if so, would be nice to at least mention how users should reason about Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.h@458 PS6, Line 458: bool ready_; : bool unregistered_; : Status inflight_status_; : std::deque<std::unique_ptr<WriteOpState>> ops_queue_; > nit: would you mind also doc these? Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@79 PS6, Line 79: 2 > how this is evaluated. If default is 2, it doesn't seem to help a lot for b It should help: keep in mind that clients usually buffer inserts/updates to many rows in one write request. Maybe, the name for the flag isn't quite good? I can share more data once I have performance tests to cover this. http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@552 PS6, Line 552: dispatcher = FindOrNull(txn_op_dispatchers_, txn_id); : if (!dispatcher) { : dispatcher = &EmplaceOrDie( : &txn_op_dispatchers_, : std::piecewise_construct, : std::forward_as_tuple(txn_id), : std::forward_as_tuple(this, FLAGS_tablet_max_pending_txn_write_ops)); : } > Does LookupOrEmplace work here? Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@567 PS6, Line 567: SubmitTxnWriteCb > nit: seems we typically name callbacks based on what we're calling back fro Sure, let's keep it consistent. http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@590 PS6, Line 590: txn_op_dispatchers_.erase(it); > Do we have any guarantees that we've submitted all the pending writes? Or a Yes, we do: MarkUnregistered() would return non-OK status otherwise one line above. In addition, TxnOpDispatcher doesn't accept any new operations once it's marked as unregistered (see TxnOpDispatcher::Dispatch()). http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@642 PS6, Line 642: txn_registrator > nit: dispatcher? Same elsewhere Done http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@1227 PS6, Line 1227: transaction is not open" > Don't we also unregister if we cancel due to not being the leader? Eg in Su Right -- TxnOpDispatcher::Unregister() is called upon various errors, and not being a leader (along with not being to register a participant) are the reasons. I replaced it with "tablet replica could not accept txn write operation". That's still vague, but I don't think we want to expose TxnOpDispatcher state in the error messages. http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@1258 PS6, Line 1258: // Store the information necessary to recreate WriteOp instance: this is : // useful if TabletReplica::SubmitWrite() returns non-OK status. > nit: it'd be nice if you could mention the lifecycle of the request and res These pointers are stored at this level (i.e. scope of the while() cycle) and cannot become bogus after calling std::move(op). I'm not sure what to comment about here. http://gerrit.cloudera.org:8080/#/c/17037/6/src/kudu/tablet/tablet_replica.cc@1304 PS6, Line 1304: Cleanup > What's the point of returning a status here? Doesn't this always return 'st Yep: I guess there isn't any point returning anything from Cancel() because the return code isn't used. However, Cleanup() is used in Submit(), and it returns the status as needed. Done. -- To view, visit http://gerrit.cloudera.org:8080/17037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6 Gerrit-Change-Number: 17037 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 20 Feb 2021 02:21:56 +0000 Gerrit-HasComments: Yes