Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )
Change subject: KUDU-2612: acquire and release partition lock ...................................................................... Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813 PS3, Line 813: auto resp_error = StatusFromPB(resp.error().status()); > This is marked 'Done' but I don't see that a scenario with calling Particip Done as TxnParticipantITest.TestTxnSystemClientRepeatCalls http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930 PS3, Line 930: // TODO(awong): update this when impleme > nit: this is marked as 'Done', and I saw TODO is added in other places. Is Done http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc@282 PS5, Line 282: ASSERT_EQ(txns, replicas[i]->tablet()->txn_participant()->GetTxnsForTests()); : }); : } > nit: is this change essential? If not, maybe it's better to return back to Ah I added this because the list of transactions can be quite long, and a difference in count is a bit easier to decipher and think about than comparing the full lists that get printed out. I can revert it -- it was more useful for debugging and can be added back if we ever need to debug this further. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h File src/kudu/tablet/ops/participant_op.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h@75 PS3, Line 75: ors the given 'op_id > Can this method be invoked as a part of larger operation which might have a I'm going to punt on this. I think there'd be a decent amount of plumbing to get the client's deadline, and it'd also be for leaders only. I added a TODO instead. http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc@212 PS5, Line 212: "START. Timestamp: $0", clock::HybridClock::GetPhysicalValueMicros(state_->times > How to release the partition lock if the following code kicks in when the d If replication fails, the ScopedPartitionLock is dropped, and if that was the last reference to the partition lock, the lock is released. http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h@172 PS5, Line 172: // Acquire a shared lock on the given transaction, to ensure the : // transaction's state doesn't change while the given write is in flight. : Status AcquireTxnLock(int64_t txn_id, WriteOpState* op_state); > BTW, why do we need this method when we already have a similar one for Writ IIRC from my discussion with Hao, doing this in the BEGIN_TXN lets us catch races earlier. OTOH, thinking about it more, we ought to instead have the goal of taking the lock for as little time as possible, so I've gone and updated this so we only take the lock for writes. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@165 PS3, Line 165: If 'must_acquire' is true, : // then wait until the lock is acquired > Write operations are executed in the context of an RPC which have a timeout The op will proceed past the timeout. It might be avoidable, but I'm not sure it's a dealbreaker. At least, outside of the dispatcher, we don't have checks along the write path for timeouts to exit early, and at least in the context of the OpDrivers, I don't think it's worth the fuss for a single op, given we have similarly timed acquires on the write path already. http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/txn_participant.cc File src/kudu/tablet/txn_participant.cc: http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/txn_participant.cc@41 PS5, Line 41: namespace kudu { > warning: using decl 'Substitute' is unused [misc-unused-using-decls] Done -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 07 Apr 2021 07:48:47 +0000 Gerrit-HasComments: Yes