Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22867 )

Change subject: KUDU-1973: Send no-op heartbeat operations batched - PART 1
......................................................................


Patch Set 12:

(19 comments)

Thanks a lot for the patch!

It seems I need more time to digest this, but overall it looks OK so far.  I 
posted a few high-level comments and nits, and I'll continue reviewing this 
tomorrow.

http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG@50
PS12, Line 50: tablets
tables?

Also, what was the replication factor?  It was 3, right?


http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG@69
PS12, Line 69: 30
Just curious -- what was the rationale behind having 30 as the highest number 
for max batch size in the experiments.

I'd also think of the number that's close to the total number of tablet 
replicas (or maybe 1/2) per tablet server given the distribution of replicas in 
your experiment.

BTW, how does this work if setting this number too high -- is there some timing 
threshold that starts kicking and in such case (i.e. heartbeats shouldn't be 
delayed too much, at least not more than their interval)?


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/consensus.proto@450
PS12, Line 450: e.g
nit: e.g.


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h
File src/kudu/consensus/multi_raft_batcher.h:

http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@76
PS12, Line 76:   uint64_t Subscribe(const PeriodicHeartbeater& heartbeater);
             :   void Unsubscribe(uint64_t id);
Please document these, describing the semantics of the parameters as well.


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@106
PS12, Line 106: next_id
nit: rename to comply with style guideline for field members (should be 
next_id_ and similar)


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@109
PS12, Line 109:     MonoTime time;
              :     uint64_t id;
It would be nice to document the fields and the structure overall.


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@112
PS12, Line 112: fix Dealay
fixed delay?


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@116
PS12, Line 116:   int flush_interval_;
Why not to store this as MonoDelta?

Can this be 'const' member field?


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@118
PS12, Line 118:   void StartTimer();
style nit for here and elsewhere: don't intersperse method with fields -- first 
come the methods (member functions), the come member fields.


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@154
PS12, Line 154: int flush_interval_;
Why not to store it as MonoDelta?

Can this be 'const' member field?


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc
File src/kudu/consensus/multi_raft_batcher.cc:

http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@57
PS12, Line 57: DEFINE_int32(multi_raft_heartbeat_interval_ms,
             :              100,
             :              "The heartbeat interval for batch Raft 
replication.");
Is there any meaningful invariant between --multi_raft_heartbeat_interval_ms 
and --raft_heartbeat_interval_ms (and maybe some other flags) that makes sense 
to enforce for consistency of the system's behavior?

For example, does it makes sense to make sure multi_raft_heartbeat_interval_ms 
is never set lower than raft_heartbeat_interval_ms, etc.?


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@90
PS12, Line 90: DCHECK
nit: DCHECK_EQ might be more expressive

  DCHECK_EQ(1, peers_.count(id));


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@120
PS12, Line 120: <std::mutex>
nit for here and elsewhere: in C++17 the compiler is able to deduce the type 
from the argument of the guard's constructor, so this might be omitted


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@128
PS12, Line 128: size()
nit: using empty() for this is usually preferred due to better expression and 
performance


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@135
PS12, Line 135: MonoDelta::FromMilliseconds(FLAGS_raft_heartbeat_interval_ms);
nit: create an const instance of MonoDelta once outside of the loop, and use it 
here


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@138
PS12, Line 138:     current_calls.emplace_back(peer_it->second, front.id);
Could it happen that the size of the resulted RPC goes over the limit defined 
by the --rpc_max_message_size flag?  If so, what might be the best place to 
catch that condition and handle it appropriately?


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@238
PS12, Line 238:   auto res = batchers_.find(hostport);
              :   if (res != batchers_.end()
nit for here and elsewhere: consider using helper from gutil/map-util.h (e.g., 
FindOrNull(), etc.)


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@239
PS12, Line 239: batcher = res->second.lock()
BTW, under what conditions the weak pointer stored in the hash map couldn't be 
upgraded to a shared pointer?  Maybe, it makes sense to add DCHECK to verify 
for particular invariants?


http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/tserver/tablet_service.cc@1809
PS12, Line 1809: class
nit: is this needed here once the corresponding header is included?



--
To view, visit http://gerrit.cloudera.org:8080/22867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538
Gerrit-Change-Number: 22867
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Tue, 12 Aug 2025 05:20:45 +0000
Gerrit-HasComments: Yes

Reply via email to