Alexander Behm created IMPALA-6024:
--------------------------------------

             Summary: Add minimum sample size for COMPUTE STATS TABLESAMPLE
                 Key: IMPALA-6024
                 URL: https://issues.apache.org/jira/browse/IMPALA-6024
             Project: IMPALA
          Issue Type: Sub-task
          Components: Frontend
    Affects Versions: Impala 2.10.0
            Reporter: Alexander Behm
            Assignee: Alexander Behm


We should introduce a minimum sample size in bytes for COMPUTE STATS 
TABLESAMPLE. Reasons:
* For small tables sampling does not make sense. Accurate stats can be obtained 
cheaply without sampling.
* Very small sample sizes mostly do not make sense - some minimum of data is 
required to get meaningful stats. 

I think a 1GB minimum might be a good choice and ideally this minimum sample 
size would be configurable.

Many other DBMS have stats collection with sampling and in many cases a minimum 
sample size is required to get any meaningful stats.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to