Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21356 )

Change subject: [metrics] Add metrics for tablet copy op time
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc
File src/kudu/tablet/tablet_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@227
PS8, Line 227: Tablet Copy Operation Duration
Would 'Tablet Copy Duration' be good enough?


http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@229
PS8, Line 229: on this tablet
Does it make sense to mention this is the duration as seen from the source 
tablet replica?


http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@229
PS8, Line 229: copy tablet
tablet copying


http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tablet/tablet_metrics.cc@231
PS8, Line 231: 60000000LU
Yingchun already pointed at that in PS5, but it seems there is still room for 
improvement.

With current settings, 60000000 stands for maximum value of 1 minute (60 
seconds).  Are you sure this makes sense?  I suspect that for a large tablet 
replica being copied over a slow network it might take tens of minutes to 
complete the operation, so an hour for the maximum duration would be prudent.

Also, I don't think it makes a lot of sense to have microseconds for the unit 
for the duration here.  I'd think it to be milliseconds at most.


http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_service-test.cc
File src/kudu/tserver/tablet_copy_service-test.cc:

http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_service-test.cc@237
PS8, Line 237: TEST_F(TabletCopyMetricTest, TestRunTimeMetricFinishState) {
             :   const auto before_cnt = CopyRunTime()->TotalCount();
             :   string session_id;
             :   ASSERT_OK(DoBeginValidTabletCopySession(&session_id));
             :
             :   EndTabletCopySessionResponsePB resp;
             :   RpcController controller;
             :   ASSERT_OK(DoEndTabletCopySession(session_id, true, nullptr, 
&resp, &controller));
             :   ASSERT_EQ(before_cnt + 1, CopyRunTime()->TotalCount());
             : }
Does it make sense to verify the corresponding metric both at the destination 
tablet replica as well?  I guess it should show 0 for TotalCount(), right?


http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_source_session.cc
File src/kudu/tserver/tablet_copy_source_session.cc:

http://gerrit.cloudera.org:8080/#/c/21356/8/src/kudu/tserver/tablet_copy_source_session.cc@514
PS8, Line 514:     int64_t elapsed_ms = (MonoTime::Now() - 
start_time_).ToMilliseconds();
             :     metrics->tablet_copy_duration->Increment(elapsed_ms);
In tablet_metrics.cc, the tablet_copy_duration is defined in microseconds is 
PS8, so this code is inconsistent.



--
To view, visit http://gerrit.cloudera.org:8080/21356
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088f6a9a8a07ad39ca95ae8b4995ce00d1a0d00c
Gerrit-Change-Number: 21356
Gerrit-PatchSet: 8
Gerrit-Owner: KeDeng <kdeng...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: KeDeng <kdeng...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <laiyingc...@apache.org>
Gerrit-Comment-Date: Fri, 17 May 2024 18:14:55 +0000
Gerrit-HasComments: Yes

Reply via email to