Andrew Wong created KUDU-3168:
---------------------------------

             Summary: Script or document how to identify tables that suffer 
from KUDU-1400
                 Key: KUDU-3168
                 URL: https://issues.apache.org/jira/browse/KUDU-3168
             Project: Kudu
          Issue Type: Task
          Components: compaction, documentation
    Affects Versions: 1.8.0
            Reporter: Andrew Wong


In older versions of Kudu, tables may suffer from KUDU-1400 when workloads are 
sequential and have a low ingest rate (e.g. KBs per minute). Today, the way to 
identify issues is to notice that scans for a specific tablet are slow for the 
amount of data being scanned, looking at that tablet's rowset layout diagram, 
and noting that a large number of rowsets. The guidance then is usually to 
rewrite the table with a higher ingest rate (e.g. through an Impala CTAS) and 
that typically solves the issue for that table.

Users may want ways to identify tables (not tablets) that suffer from this 
issue so they can be rewritten. It would be nice to document how to do this 
using existing tooling (or make a script available).

The {{kudu fs list}} tool with the {{rowset-id}} column filter seems like it 
would be useful in identifying the number of rowsets there are and how large 
they are. If used with {{kudu table list}} users ought to easily identify 
affected tables.

In later versions of Kudu, the {{num_rowsets_on_disk}} metric should be useful 
in identifying such cases (compute {{tablet size / num_rowsets_on_disk}} and 
compare against 32MB, the current default target DRS size).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to