Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/22867 )
Change subject: KUDU-1973: Send no-op heartbeat operations batched - PART 1 ...................................................................... Patch Set 12: (19 comments) Thanks a lot for the patch! It seems I need more time to digest this, but overall it looks OK so far. I posted a few high-level comments and nits, and I'll continue reviewing this tomorrow. http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG@50 PS12, Line 50: tablets tables? Also, what was the replication factor? It was 3, right? http://gerrit.cloudera.org:8080/#/c/22867/12//COMMIT_MSG@69 PS12, Line 69: 30 Just curious -- what was the rationale behind having 30 as the highest number for max batch size in the experiments. I'd also think of the number that's close to the total number of tablet replicas (or maybe 1/2) per tablet server given the distribution of replicas in your experiment. BTW, how does this work if setting this number too high -- is there some timing threshold that starts kicking and in such case (i.e. heartbeats shouldn't be delayed too much, at least not more than their interval)? http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/consensus.proto File src/kudu/consensus/consensus.proto: http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/consensus.proto@450 PS12, Line 450: e.g nit: e.g. http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h File src/kudu/consensus/multi_raft_batcher.h: http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@76 PS12, Line 76: uint64_t Subscribe(const PeriodicHeartbeater& heartbeater); : void Unsubscribe(uint64_t id); Please document these, describing the semantics of the parameters as well. http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@106 PS12, Line 106: next_id nit: rename to comply with style guideline for field members (should be next_id_ and similar) http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@109 PS12, Line 109: MonoTime time; : uint64_t id; It would be nice to document the fields and the structure overall. http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@112 PS12, Line 112: fix Dealay fixed delay? http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@116 PS12, Line 116: int flush_interval_; Why not to store this as MonoDelta? Can this be 'const' member field? http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@118 PS12, Line 118: void StartTimer(); style nit for here and elsewhere: don't intersperse method with fields -- first come the methods (member functions), the come member fields. http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.h@154 PS12, Line 154: int flush_interval_; Why not to store it as MonoDelta? Can this be 'const' member field? http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc File src/kudu/consensus/multi_raft_batcher.cc: http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@57 PS12, Line 57: DEFINE_int32(multi_raft_heartbeat_interval_ms, : 100, : "The heartbeat interval for batch Raft replication."); Is there any meaningful invariant between --multi_raft_heartbeat_interval_ms and --raft_heartbeat_interval_ms (and maybe some other flags) that makes sense to enforce for consistency of the system's behavior? For example, does it makes sense to make sure multi_raft_heartbeat_interval_ms is never set lower than raft_heartbeat_interval_ms, etc.? http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@90 PS12, Line 90: DCHECK nit: DCHECK_EQ might be more expressive DCHECK_EQ(1, peers_.count(id)); http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@120 PS12, Line 120: <std::mutex> nit for here and elsewhere: in C++17 the compiler is able to deduce the type from the argument of the guard's constructor, so this might be omitted http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@128 PS12, Line 128: size() nit: using empty() for this is usually preferred due to better expression and performance http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@135 PS12, Line 135: MonoDelta::FromMilliseconds(FLAGS_raft_heartbeat_interval_ms); nit: create an const instance of MonoDelta once outside of the loop, and use it here http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@138 PS12, Line 138: current_calls.emplace_back(peer_it->second, front.id); Could it happen that the size of the resulted RPC goes over the limit defined by the --rpc_max_message_size flag? If so, what might be the best place to catch that condition and handle it appropriately? http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@238 PS12, Line 238: auto res = batchers_.find(hostport); : if (res != batchers_.end() nit for here and elsewhere: consider using helper from gutil/map-util.h (e.g., FindOrNull(), etc.) http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/consensus/multi_raft_batcher.cc@239 PS12, Line 239: batcher = res->second.lock() BTW, under what conditions the weak pointer stored in the hash map couldn't be upgraded to a shared pointer? Maybe, it makes sense to add DCHECK to verify for particular invariants? http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/22867/12/src/kudu/tserver/tablet_service.cc@1809 PS12, Line 1809: class nit: is this needed here once the corresponding header is included? -- To view, visit http://gerrit.cloudera.org:8080/22867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538 Gerrit-Change-Number: 22867 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Martonka <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Tue, 12 Aug 2025 05:20:45 +0000 Gerrit-HasComments: Yes
