Alexander Behm created IMPALA-6024: -------------------------------------- Summary: Add minimum sample size for COMPUTE STATS TABLESAMPLE Key: IMPALA-6024 URL: https://issues.apache.org/jira/browse/IMPALA-6024 Project: IMPALA Issue Type: Sub-task Components: Frontend Affects Versions: Impala 2.10.0 Reporter: Alexander Behm Assignee: Alexander Behm
We should introduce a minimum sample size in bytes for COMPUTE STATS TABLESAMPLE. Reasons: * For small tables sampling does not make sense. Accurate stats can be obtained cheaply without sampling. * Very small sample sizes mostly do not make sense - some minimum of data is required to get meaningful stats. I think a 1GB minimum might be a good choice and ideally this minimum sample size would be configurable. Many other DBMS have stats collection with sampling and in many cases a minimum sample size is required to get any meaningful stats. -- This message was sent by Atlassian JIRA (v6.4.14#64029)