Wei-Chiu Chuang created HDDS-13788:
--------------------------------------
Summary: [Docs] Add performance troubleshooting doc
Key: HDDS-13788
URL: https://issues.apache.org/jira/browse/HDDS-13788
Project: Apache Ozone
Issue Type: Task
Reporter: Wei-Chiu Chuang
I'd like to add a new page for "Troubleshoot performance issues" in the user
documentation under Troubleshooting section
[https://ozone.apache.org/docs/edge/troubleshooting.html]
Or it can go to the Observability page.
[https://ozone.apache.org/docs/edge/feature/observability.html]
It will include:
1. Flame graph
If a particular operation runs slow and CPU utilization is high, use flame
graph to inspect hotspots.
Enable Framegraph endpoints (hdds.profiler.endpoint.enabled = true), download
async profiler, start async profiler 2.x from end point or command line. The
output is exported into a SVG html file.
The flame graph is collected on a per process basis. To generate flame graphs
across a cluster,
# Download this repo [https://github.com/jojochuang/ozone_perf.git]
# Download async profiler to /opt/async-profiler-2.8.1-linux-x64/
# Add cluster hostnames to the file cluster_hosts.txt, one hostname per line.
# Update PASSWORDLESS_USER in conf.sh to a user that has passwordless ssh
capability in the cluster. This user must also have sudo privileges.
# Run ‘./start_profiles.sh’ to kick off profiling
# Wait for some time
# Run ‘./collect_profiles.sh’ to stop the profiling and to collect
flamegraphs. They will be downloaded and compressed into a tarball. This script
collects flamegraphs for Ozone OM, SCM, DN and Recon, HDFS NN and DN, Impala
daemon and HBase RegionServer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]