[ https://issues.apache.org/jira/browse/KUDU-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616182#comment-17616182 ]
ASF subversion and git services commented on KUDU-2597: ------------------------------------------------------- Commit e5ace5fa28154fa906c1b087e3b80461b6d85337 in kudu's branch refs/heads/master from Andrew Wong [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=e5ace5fa2 ] KUDU-2353: tool to parse metrics out of diagnostic logs This patch contains C++ implementation of the metrics log parser script. There are a couple functional differences between this tool and the existing script: - This tool recognizes table metrics - This tool allows for filtering metrics by table or tablet identifier - Histogram metrics for this tool also spit out the rough count of the measurements This patch also addresses KUDU-2597 as a subtask of the JIRA item mentioned in the summary. NOTES: - Kudu metrics are only output into the diagnostic log when a metric has changed, so this patch tracks the metric values per entity (tablet ID, table ID, server) at each point in time in order to output the correct values. This means that if within a given set of files, a tablet's metric has not changed and no corresponding records are in the files, this tool is not printing any information on the tablet's metrics. - Kudu histogram metrics do spit out a summary for percentiles. The tool explicitly does not use that and instead generates these metrics from the histogram counts. While less accurate (IIUC, the counts can be lossy), this allows us to generate aggregated summaries from multiple entities. Here's an example: [awong@va1022 release]$ ./bin/kudu diagnose parse_metrics kudu-tserver.worker12.foobar.com.kudu.diagnostics.20210123-201217.0.74565 --simple_metrics=tablet.scans_started:num_scans_started --rate_metrics=tablet.scans_started:scans_started_per_sec --histogram_metrics=server.scanner_duration:scanner_duration_ us,server.handler_latency_kudu_tserver_TabletServerService_Scan:scan_rpc_us I0131 11:53:27.010298 151768 diagnostics_log_parser.cc:272] collecting simple metric tablet.scans_started as num_scanners_started I0131 11:53:27.010438 151768 diagnostics_log_parser.cc:279] collecting rate metric tablet.scans_started as scanners_started_per_sec I0131 11:53:27.010455 151768 diagnostics_log_parser.cc:286] collecting histogram metric server.handler_latency_kudu_tserver_TabletServerService_Scan as scan_rpc_us I0131 11:53:27.010524 151768 diagnostics_log_parser.cc:286] collecting histogram metric server.scanner_duration as scanner_duration_us timestamp num_scanners_started scanners_started_per_sec scan_rpc_us_count scan_rpc_us_min scan_rpc_us_p50 scan_rpc_us_p75 scan_rpc_us_p95 scan_rpc_us_p99 scan_rpc_us_p99_99 scan_rpc_us_max scanner_duration_us_count scanner_duration_us_min scanner_duration_us_p50 scanner_duration_us_p75 scanner_duration_us_p95 scanner_duration_us_p99 scanner_duration_us_p99_99 scanner_duration_us_max 1611432793767488 68492 0 434170147 2 1215 1639 3711 12927 501759 8650751 1854125 12 23295 1302527 54788095 60030975 60030975 60030975 1611432853767552 231516 2717.0637684653134 434198546 2 1215 1639 3711 12927 501759 8650751 1854200 12 23295 1302527 54788095 60030975 60030975 60030975 1611432913767616 349073 1959.2812434333403 434227285 2 1215 1639 3711 12927 501759 8650751 1854306 12 23295 1302527 54788095 60030975 60030975 60030975 1611432973767689 829597 8008.7235893863 434255021 2 1215 1639 3711 12927 501759 8650751 1854517 12 23295 1302527 54788095 60030975 60030975 60030975 1611433033767772 926516 1615.314432148369 434283184 2 1215 1639 3711 12927 501759 8650751 1854605 12 23295 1302527 54788095 60030975 60030975 60030975 1611433093767841 926626 1.8333312250024245 434309627 2 1215 1639 3711 12927 501759 8650751 1854719 12 23295 1302527 54788095 60030975 60030975 60030975 1611433153767902 960053 557.11610026529809 434339928 2 1215 1639 3711 12927 501759 8650751 1854788 12 23295 1302527 54788095 60030975 60030975 60030975 1611433213767967 1009625 826.19910495096963 434366776 2 1215 1639 3711 12927 501759 8650751 1854831 12 23295 1302527 54788095 60030975 60030975 60030975 1611433273768032 1059960 838.91575784126235 434394555 2 1215 1639 3711 12927 501759 8650751 1854966 12 23295 1302527 54788095 60030975 60030975 60030975 1611433333768067 1061577 26.949984279175837 434420683 2 1215 1639 3711 12927 501759 8650751 1855023 12 23295 1302527 54788095 60030975 60030975 60030975 1611433393768130 1082096 341.98297425121041 434447991 2 1215 1639 3711 12927 501759 8650751 1855185 12 23295 1302527 54788095 60030975 60030975 60030975 1611433453768205 1083102 16.76664570835953 434476348 2 1215 1639 3711 12927 501759 8650751 1855285 12 23295 1302527 54788095 60030975 60030975 60030975 1611433513768270 1088338 87.2665721278802 434498551 2 1215 1639 3711 12927 501759 8650751 1855388 12 23295 1302527 54788095 60030975 60030975 60030975 Change-Id: I8077fb4f6b41fe4b2bd6c877af379ea7a9f415b1 Reviewed-on: http://gerrit.cloudera.org:8080/12570 Tested-by: Kudu Jenkins Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com> Reviewed-by: Alexey Serbin <ale...@apache.org> > Add CLI tool to parse metrics from diagnostics log > -------------------------------------------------- > > Key: KUDU-2597 > URL: https://issues.apache.org/jira/browse/KUDU-2597 > Project: Kudu > Issue Type: Sub-task > Components: CLI, metrics, ops-tooling > Reporter: Andrew Wong > Assignee: Andrew Wong > Priority: Major > > We have a somewhat-crufty 'parse_metrics_log.py' script that isn't > particularly great. It'd be nice if metrics parsing were baked into the CLI > to provide something more fit for human consumption: a tsv, a summary of > different perf metrics, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)