Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21316 )

Change subject: [metrics] Add metrics for create and delete op time
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21316/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21316/1//COMMIT_MSG@11
PS1, Line 11: These monitoring metrics will be very helpful for analyzing
            : issues related to high CPU usage.
Just curious: did you see that creating/deleting a tablet consume a lot of CPU 
for some reason?

If you are interested in tracking CPU usage, I'd think that  more appropriate 
measurement in this context would be real, system, and user time consumed, not 
just difference in wall clock time that's captured by the newly introduced 
metrics, no?


http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/tablet_server-test.cc@4057
PS1, Line 4057: CHECK_EQ
Here and below: use ASSERT in tests, not CHECK


http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/tablet_server-test.cc@4132
PS1, Line 4132:   LOG(INFO) << "metric delete_tablet_run_time MinValue: "
              :         << delete_tablet_run_time->histogram()->MinValue()
              :         << ", MeanValue: " << 
delete_tablet_run_time->histogram()->MeanValue()
              :         << ", MaxValue: " << 
delete_tablet_run_time->histogram()->MaxValue()
              :         << ", TotalCount: " << 
delete_tablet_run_time->histogram()->TotalCount()
              :         << ", TotalSum: " << 
delete_tablet_run_time->histogram()->TotalSum();
Why not to use HdrHistogram::DumpHumanReadable() here instead?


http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/ts_tablet_manager-test.cc
File src/kudu/tserver/ts_tablet_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/ts_tablet_manager-test.cc@151
PS1, Line 151:     CHECK_EQ(before_cnt + 1, 
create_tablet_run_time->TotalCount());
Why to assert this invariant right here?

If you want to check the functionality of new newly introduced metric, maybe 
add a new test scenario to test it and have proper assertion there, but not in 
this common method?


http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/ts_tablet_manager-test.cc@152
PS1, Line 152:     LOG(INFO) << "metric create_tablet_run_time MinValue: "
             :         << create_tablet_run_time->histogram()->MinValue()
             :         << ", MeanValue: " << 
create_tablet_run_time->histogram()->MeanValue()
             :         << ", MaxValue: " << 
create_tablet_run_time->histogram()->MaxValue()
             :         << ", TotalCount: " << 
create_tablet_run_time->histogram()->TotalCount()
             :         << ", TotalSum: " << 
create_tablet_run_time->histogram()->TotalSum();
Is it possible to use HdrHistogram::DumpHumanReadable() here instead?

BTW, why to add this?  Our current tests are quite verbose already, so I'd 
think twice before adding extra output which isn't used to distinguish between 
passing and failing tests programmatically or useful for 
troubleshooting/debugging of failing tests by humans.


http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/21316/1/src/kudu/tserver/ts_tablet_manager.cc@674
PS1, Line 674:   MonoTime time_started = MonoTime::Now();
If you include time taken to open a newly created tablet into the whole 
measurement, then why not to include the time taken to acquire tablet's manager 
lock as well?  In other words, why not to sample 'time_started' at line 654?



--
To view, visit http://gerrit.cloudera.org:8080/21316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bd52013caa94a33143cb16ff3831a49b74bac4
Gerrit-Change-Number: 21316
Gerrit-PatchSet: 1
Gerrit-Owner: KeDeng <kdeng...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Comment-Date: Wed, 17 Apr 2024 07:21:31 +0000
Gerrit-HasComments: Yes

Reply via email to