[
https://issues.apache.org/jira/browse/DRILL-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225231#comment-14225231
]
Deneche A. Hakim commented on DRILL-1742:
-----------------------------------------
Sure. Here are the tests I did on my laptop using a local installation of
hadoop-0.20.2 and hive-0.12.0.
1. When a table is just created and no rows are added to it, the "numRows"
property isn't available. In this case HiveScan.getScanStats() uses the size of
the input splits to compute an estimated number of rows and finds 0. So the
estimated number of rows is correct.
2
a) adding rows to the table using "LOAD DATA ..." does add a "numRows" property
to the table (and it's partitions if available), but it's value is still 0.
HiveScan.getScanStats() uses the size of the input splits to estimate the
number of rows, the estimation isn't accurate but it's better than the value in
the stats.
b) running "ANALYZE TABLE table_name COMPUTE STATISTICS" in hive updates the
"numRows" property with the correct number of rows. This time
HiveScan.getScanStats() uses this value rather than estimating one using the
size of the input splits.
3. When the table has partitions, "numRows" is computed and available for each
parition. HiveScan correctly computes the reduced row count when some of the
partitions are pruned.
The only limitation is that HiveScan.getScanStats() assumes that when the
statistics are available for a table, they are up to date. This may require the
user to manually call "analyze ... compute statistics".
> Use Hive stats when planning queries on Hive data sources
> ---------------------------------------------------------
>
> Key: DRILL-1742
> URL: https://issues.apache.org/jira/browse/DRILL-1742
> Project: Apache Drill
> Issue Type: Improvement
> Components: Query Planning & Optimization, Storage - Hive
> Affects Versions: 0.6.0
> Reporter: Venki Korukanti
> Assignee: Deneche A. Hakim
> Fix For: 0.7.0
>
> Attachments: DRILL-1742.1.patch.txt, DRILL-1742.2.patch.txt,
> DRILL-1742.3.patch.txt, DRILL-1742.4.patch.txt
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)