Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22867 )

Change subject: KUDU-1973: Send no-op heartbeat operations batched - PART 1
......................................................................


Patch Set 15:

(21 comments)

I added a few more comments (mostly nits), but I still need to clarify on 
populating particular fields in the batched consensus requests.  I'm planning 
to do another review round.  Feel free to wait for that feedback before revving 
the patch, or you can revv it right away addressing whatever feedback is 
already present -- that's up to you.

Thank you for your patience!

http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG@69
PS12, Line 69: 30
> Using 10 decreased the cpu usages with ~50%.
I see -- this makes sense, indeed.  I guess it's not important to check much 
higher numbers then.  Thank you for the explanation.

It's great to have a number that doesn't actually depend a lot from the number 
of total number of replicas per tablet server -- we can have a good ballpark 
for the default setting for the corresponding flag.


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.h
File src/kudu/consensus/multi_raft_batcher.h:

http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.h@90
PS15, Line 90: .
nit: remove the stray period?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.h@120
PS15, Line 120: std::mutex heartbeater_lock_
Could you please add a comment to explain what this lock protects?

Semantically, is it just to protect 'closed_'?  If so, maybe use a spinlock 
instead of mutex and update the code correspondingly (i.e. shorten the critical 
section to span just over 'closed_' r/w access)?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.h@124
PS15, Line 124:   friend class MultiRaftManager;
style nit for here and elsewhere: move this into the very beginning of the 
private section, just after the 'private' tag.


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.h@153
PS15, Line 153: std::mutex mutex_;
Please document what this synchronization primitive protects.


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc
File src/kudu/consensus/multi_raft_batcher.cc:

http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@84
PS15, Line 84: namespace {}  // namespace
nit: remove this?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@108
PS15, Line 108: messenger
nit: wrap this into std::move() to avoid inc/dec of the usage counter?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@116
PS15, Line 116: heartbeat_timer_
nit: add DCHECK() to protect against mistakes creating the timer multiple times?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@132
PS15, Line 132:   auto send_until = MonoTime::Now() + batch_time_window_;
              :
              :   std::vector<PeriodicHeartbeater> current_calls;
              :   current_calls.reserve(FLAGS_multi_raft_batch_size);
              :   auto submit_callbacks = [&]() {
              :     auto this_ptr = shared_from_this();
              :     KUDU_WARN_NOT_OK(
              :         raft_pool_token_->Submit([this_ptr, current_calls = 
std::move(current_calls)]() {
              :           this_ptr->SendOutScheduled(current_calls);
              :         }),
              :         "Failed to submit multi-raft heartbeat batcher task");
              :     current_calls.clear();
              :   };
              :
              :   const auto sending_period = 
MonoDelta::FromMilliseconds(FLAGS_raft_heartbeat_interval_ms);
nit: is it possible to move this out of the critical section protected by the 
'heartbeater_lock_' (at least, to make sure memory allocation performed by 
current_calls.reserve() is done outside)?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@171
PS15, Line 171:   auto status = data->controller.status();
What is this for?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@185
PS15, Line 185:   closed_ = true;
nit: could we release the heartbeater_lock_ right after this?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@275
PS15, Line 275:   for (auto& entry : batchers_) {
              :     if (auto batcher = entry.second.lock()) {
              :       batcher->Shutdown();
              :     }
              :   }
              :   batchers_.clear();
              :   if (raft_pool_token_) {
              :     raft_pool_token_->Shutdown();
              :     raft_pool_token_.reset();
              :   }
Would it make sense to move the contents of batches_ and raft_pool_token_ 
(e.g., using swap) under the lock in a protected section, but perform the 
actual shutdown outside of the critical section?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_batcher.cc@287
PS15, Line 287: MultiRaftManager::~MultiRaftManager() {}
nit: move into the header file to become a default destructor?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_consensus_data.h
File src/kudu/consensus/multi_raft_consensus_data.h:

http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_consensus_data.h@50
PS15, Line 50: int
nit: use size_t instead?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/consensus/multi_raft_consensus_data.h@62
PS15, Line 62: res.set_safe_timestamp(req.safe_timestamp());
Here might be dragons.  The presence of this field (a) has some special 
semantics and (b) it's not safe to get the 'default' value if 'safe_timestamp' 
field isn't even set in 'req'.  The latter applies to all other fields here, so 
I think it's necessary to first check for the presence of the field in 'req' 
using 'has_xxx()' method, and only set it in 'res' if it's present in 'req'.


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/tserver/tablet_service.cc@1808
PS15, Line 1808: class
nit: remove 'class'?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/tserver/tablet_service.cc@1809
PS15, Line 1809: consensus::
nit: add corresponding 'using' directive and drop the namespace prefix?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/tserver/tablet_service.cc@1810
PS15, Line 1810: rpc::
nit: drop the namespace prefix -- there is corresponding 'using' entry


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/tserver/tablet_service.cc@1817
PS15, Line 1817: auto
nit: would auto* be better for readbility?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/tserver/tablet_service.cc@1824
PS15, Line 1824: Status
nit: use auto here as well?


http://gerrit.cloudera.org:8080/#/c/22867/15/src/kudu/tserver/tablet_service.cc@1830
PS15, Line 1830: auto state
nit: const auto& ?



--
To view, visit http://gerrit.cloudera.org:8080/22867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538
Gerrit-Change-Number: 22867
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Wed, 13 Aug 2025 06:56:46 +0000
Gerrit-HasComments: Yes

Reply via email to