Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
......................................................................


Patch Set 7: Code-Review+1

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc
File src/kudu/integration-tests/txn_commit-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302
PS7, Line 302: DeleteTable
Does it work as expected as well if dropping a tablet (just a part of a table), 
say, dropping one range of a range-partitioned table?  Is it worth adding a 
test scenario for such case?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@316
PS7, Line 316: TestRestartingWhileCommitting
I think it would be great to add a scenario when TxnStatusManager instance is 
frozen/unfrozen (i.e. sent SIGSTOP/SIGCONT) during the commit phase.  That 
should include a case when the leadership is transferred to another instance 
because of the unresponsive process hosting the former TxnStatusManager leader 
replica, but once the commit finalization is picked up by the new leader, the 
former leader is back and tries to continue its the commit phase finalization 
routine.

It guess it might be a good test to verify the robustness of the commit phase 
and re-enterability of corresponding methods.

What do you think?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@565
PS7, Line 565:
             :   // Wait a bit to allow would-be background aborts to happen.
             :   SleepFor(MonoDelta::FromSeconds(1));
Is this going to put the txn keepalive thread to sleep as well?  I'm not sure.

Instead, why not to move the 'txn' handle into a sub-scope, and serialize it 
into a string which is stored in the top-level scope.  Then, when the original 
'txn' scope goes out of scope, it's possible to de-serialize the handle and use 
the de-serialized handle for calls like KuduTransaction::IsCommitComplete() and 
alike.

What do you think?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690
PS7, Line 690: tsm_ts_->Shutdown();
I guess a real 'abrupt' way of terminating the process is sending SIGKILL :)

Do you think it's worth adding a scenario like that in this test suite?  Doing 
that in a separate change list might be the best option, I guess.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754
PS7, Line 754: }
It would be great to add at least one failure scenario, when finalizing the 
commit phase fails to verify that TxnManager doesn't crash and the transaction 
fails to finalize:
  * one participant responses with an error to its BEGIN_COMMIT request (e.g., 
only one replica out of 3 is running -- BTW, is it going to turn into 
Status::TimedOut() seen at the TxnStatusManager's side or that would be 
something else?)
  * sending the commit timestamp from a participant fails (again, because of 
only one replica out of 3 is running)


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152
PS4, Line 152:
> I don't think so, since I'm pretty sure the client automatically retries Se
Ah, indeed.  It seems I misunderstood where the status comes from (even if you 
added a comment :) ).

Thank you for the clarification!


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201
PS7, Line 201:     if (PREDICT_FALSE(s.IsNotFound())) {
             :       LOG(INFO) << Substitute("Participant $0 was not found: $1",
             :                               participant_ids_[participant_idx], 
s.ToString());
After some consideration, I'm not quite sure this 'good-riddance' semantic is a 
good fit for all cases.

E.g., what if some batch of data is being inserted, and a tablet (or a whole 
table) is dropped unintentionally during that process?  From the user 
perspective (who isn't aware of the dropped tablet/table) that should be 
certainly an error -- the data that they thought is reliably persisted after 
the commit has successfully finalized is actually gone (what a surprise, isn't 
it?).

Should we instead fail a transaction is such a case?  Maybe, at least add a 
flag to make this behavior configurable, where by default a transaction fails 
to commit if one of its actors with some added data disappeared during the 
commit phase?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@286
PS7, Line 286:   DCHECK_EQ(0, ops_in_flight_);
             :   ops_in_flight_ = participant_ids_.size();
nit: just for the sake of working with std::atomic in a consistent way, maybe 
replace this with something like:

  auto old_val = ops_in_flight_.exchange(participant_ids_.size());
  DCHECK_EQ(0, old_val);

?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@463
PS7, Line 463: MonoDelta::FromSeconds(10)
Should this be a parameter controlled by a run-time flag?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@471
PS7, Line 471: txn_client
nit: maybe, move this outside of the 'for ()' cycle?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@492
PS7, Line 492:   // Stop any previously on-going tasks.
             :   for (auto& txn_id_and_tasks : commits_in_flight) {
             :     txn_id_and_tasks.second->stop();
             :   }
Why not to do so prior to starting new commit tasks?  I guess it might be less 
threads spawn if doing it the other way, no?

If not, I guess it would be great to document the rationale behind the current 
sequence.



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Feb 2021 06:19:50 +0000
Gerrit-HasComments: Yes

Reply via email to