Wei-Chiu Chuang created HDDS-13788:
--------------------------------------

             Summary: [Docs] Add performance troubleshooting doc
                 Key: HDDS-13788
                 URL: https://issues.apache.org/jira/browse/HDDS-13788
             Project: Apache Ozone
          Issue Type: Task
            Reporter: Wei-Chiu Chuang


I'd like to add a new page for "Troubleshoot performance issues" in the user 
documentation under Troubleshooting section 
[https://ozone.apache.org/docs/edge/troubleshooting.html]

Or it can go to the Observability page. 
[https://ozone.apache.org/docs/edge/feature/observability.html]

It will include:

1. Flame graph

If a particular operation runs slow and CPU utilization is high, use flame 
graph to inspect hotspots.

Enable Framegraph endpoints (hdds.profiler.endpoint.enabled = true), download 
async profiler, start async profiler 2.x from end point or command line. The 
output is exported into a SVG html file.

The flame graph is collected on a per process basis. To generate flame graphs 
across a cluster,
 # Download this repo [https://github.com/jojochuang/ozone_perf.git]
 # Download async profiler to /opt/async-profiler-2.8.1-linux-x64/
 # Add cluster hostnames to the file cluster_hosts.txt, one hostname per line.
 # Update PASSWORDLESS_USER in conf.sh to a user that has passwordless ssh 
capability in the cluster. This user must also have sudo privileges.
 # Run ‘./start_profiles.sh’ to kick off profiling
 # Wait for some time
 # Run ‘./collect_profiles.sh’ to stop the profiling and to collect 
flamegraphs. They will be downloaded and compressed into a tarball. This script 
collects flamegraphs for Ozone OM, SCM, DN and Recon, HDFS NN and DN, Impala 
daemon and HBase RegionServer.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to