Mahesh Reddy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19931 )

Change subject: KUDU-3476: Make replica placement range and table aware
......................................................................


Patch Set 12:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19931/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19931/11//COMMIT_MSG@25
PS11, Line 25: controls whether or not the replica placement is range aware.
> Just for reference: the following changelist provides a proper link to the
Shared the correct public facing design doc and sent doc to mailing list.


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/integration-tests/create-table-itest.cc
File src/kudu/integration-tests/create-table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/integration-tests/create-table-itest.cc@484
PS10, Line 484:      ASSERT_EQ(range_bounds.size(), table.second.size());
              :       for (const auto& ranges : table.second) {
              :         // Since there are ten buckets instead of five for 
table "table1",
              :         // we expect twice as many rep
> nit: Maybe add a comment on how did we arrive at these numbers?
Changed the values from being hardcoded to better understand


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master.proto@338
PS10, Line 338: // The number of tablets that are BOOTSTRAPPING or RUNNING per 
range.
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/master_service.cc@459
PS10, Line 459:                             
req->num_live_tablets_by_dimension().e
> Maybe add a todo if planning to address in future.
Done


http://gerrit.cloudera.org:8080/#/c/19931/11/src/kudu/master/placement_policy-test.cc
File src/kudu/master/placement_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/19931/11/src/kudu/master/placement_policy-test.cc@1066
PS11, Line 1066: ASSERT_LE(748
> Curious as to how did we arrive at this range [748, 752]?
In most cases it should be 750, but there's some flakiness so to safeguard 
against that I expanded the range a little.


http://gerrit.cloudera.org:8080/#/c/19931/11/src/kudu/master/placement_policy-test.cc@1441
PS11, Line 1441: 3110
> nit: Should this be 3110? Also curious on what causes the deviation here fr
Correct, it should be 3110, I fixed that. For some of the test cases, there 
will be some flakiness as it's not expected or important for us to see the same 
value each time. As long as it's in a range near the expected value, it works 
fine. As to what causes the flakiness, the recentReplicaCreation values by 
range and table are eventually decayed back to 0 so it can be a result of the 
decay and the initial load on these servers. The decay won't be the same each 
time, it depends on how long it takes to place replicas on each tserver, so 
that varies run to run so sometimes we'll see a slight deviation.


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc
File src/kudu/master/ts_descriptor.cc:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/master/ts_descriptor.cc@214
PS10, Line 214:   const double kHalflifeSecs = 
FLAGS_tserver_last_replica_creations_halflife_ms / 1000;
              :   const MonoTime now = MonoTime::Now();
              :   // If map for the table or range hasn't been initialized yet, 
use init_time_ as last decay.
              :   MonoTime last_decay =
              :       last_replica_decay_by_range_.find(table_id) == 
last_replica_decay_by_range_.end() ||
              :       
last_replica_decay_by_range_[table_id].first.find(range_start_key) ==
              :       last_replica_decay_by_range_[table_id].first.end() ?
              :       init_time_ : 
last_replica_decay_by_range_[table_id].first[range_start_key];
              :   double secs_since_last_decay = (now - last_decay).To
> Probably, I haven't properly articulated my question.  I was curious exactl
The purpose of decay in this context is to eventually "reset" these values to 
zero. Without the decay, the recentReplicaCreation values for ranges/table 
would just continue to increase. Since we're only interested in the number of 
replica creations recently, it makes sense for these values to reset to zero 
since eventually the tablet servers would report the number of live replicas by 
range/table. We only need the values for recently created replicas while the 
servers wouldn't report the values, so after let's say a creating a table and 
all the replicas are placed, it's safe to decay the values back to 0 as the 
tablet servers would reflect the newly placed replicas through 
GetNumLiveTabletsByRangePerTable().


http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

http://gerrit.cloudera.org:8080/#/c/19931/10/src/kudu/tserver/ts_tablet_manager.h@222
PS10, Line 222: umLiveTabletsByRangePerT
> Maybe this will help:
Thanks Abhishek, including replicas in an INTIIALIZED state makes sense to me. 
The states FAILED, STOPPING, AND STOPPED probably shouldn't be included as 
tombstoned replicas don't service write/read requests anyways. Let me know what 
you guys think about adding INITIALIZED state and not the others. Should I add 
it to the two methods above, GetNumLiveTablets() and 
GetNumLiveTabletsByDimension()?



--
To view, visit http://gerrit.cloudera.org:8080/19931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9caeb8d5547e946bfeb152a99e1ec034c3fa0a0f
Gerrit-Change-Number: 19931
Gerrit-PatchSet: 12
Gerrit-Owner: Mahesh Reddy <mre...@cloudera.com>
Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ara...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mre...@cloudera.com>
Gerrit-Reviewer: Marton Greber <greber...@gmail.com>
Gerrit-Reviewer: Zoltan Chovan <zcho...@cloudera.com>
Gerrit-Reviewer: Ádám Bakai <aba...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jun 2023 00:22:39 +0000
Gerrit-HasComments: Yes

Reply via email to