[ https://issues.apache.org/jira/browse/KUDU-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke updated KUDU-3060: ------------------------------ Labels: roadmap-candidate (was: ) > Add a tool to identify potential performance bottlenecks > -------------------------------------------------------- > > Key: KUDU-3060 > URL: https://issues.apache.org/jira/browse/KUDU-3060 > Project: Kudu > Issue Type: Improvement > Components: CLI, perf, ui > Reporter: Andrew Wong > Priority: Major > Labels: roadmap-candidate > > When we hear users wondering why their workloads are slower than expected, > some common questions arise. It'd be great if we had a single tool (or a > single webpage) that aggregated and displayed useful information for a > specific tablet or table. Things like, for a specific table: > - How many partitions and replicas exist for the table. > - For those replicas, how they are distributed across tablet servers. > - For those tablet servers, what the block cache configuration is, and what > the current block cache stats (hit ratio, evictions, etc) are. > - For those tablet servers, which tablets have been written to recently. > - For those tablet servers, which tablets within the target table have been > written to recently. > - For those tablet servers, how many active and non-expired scanners exist. > - For those tablet servers, which tablets within the target table have been > read from recently. > - For those tablet servers, how many ongoing tablet copies there are both to > and from the server. > - For those tablet servers, how many data directories there are. > - For the data directories on those tablet servers, how many replicas are > spreading data in each directory, how many blocks there are in each, and how > much space is available in each. > The list could go on and on. It probably makes sense to break the diagnostics > into different phases or goals, maybe along the lines of 1) identifying > hotspots of workloads and lag across tablet servers (e.g. a ton of writes > going to a single tserver), and 2) digging into a single tablet server to > understand how it's provisioned and whether that provisioning is sufficient. -- This message was sent by Atlassian Jira (v8.3.4#803005)