[ 
https://issues.apache.org/jira/browse/KUDU-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3060:
------------------------------
    Labels: roadmap-candidate  (was: )

> Add a tool to identify potential performance bottlenecks
> --------------------------------------------------------
>
>                 Key: KUDU-3060
>                 URL: https://issues.apache.org/jira/browse/KUDU-3060
>             Project: Kudu
>          Issue Type: Improvement
>          Components: CLI, perf, ui
>            Reporter: Andrew Wong
>            Priority: Major
>              Labels: roadmap-candidate
>
> When we hear users wondering why their workloads are slower than expected, 
> some common questions arise. It'd be great if we had a single tool (or a 
> single webpage) that aggregated and displayed useful information for a 
> specific tablet or table. Things like, for a specific table:
> - How many partitions and replicas exist for the table.
> - For those replicas, how they are distributed across tablet servers.
> - For those tablet servers, what the block cache configuration is, and what 
> the current block cache stats (hit ratio, evictions, etc) are.
> - For those tablet servers, which tablets have been written to recently.
> - For those tablet servers, which tablets within the target table have been 
> written to recently.
> - For those tablet servers, how many active and non-expired scanners exist.
> - For those tablet servers, which tablets within the target table have been 
> read from recently.
> - For those tablet servers, how many ongoing tablet copies there are both to 
> and from the server.
> - For those tablet servers, how many data directories there are.
> - For the data directories on those tablet servers, how many replicas are 
> spreading data in each directory, how many blocks there are in each, and how 
> much space is available in each.
> The list could go on and on. It probably makes sense to break the diagnostics 
> into different phases or goals, maybe along the lines of 1) identifying 
> hotspots of workloads and lag across tablet servers (e.g. a ton of writes 
> going to a single tserver), and 2) digging into a single tablet server to 
> understand how it's provisioned and whether that provisioning is sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to