[ 
https://issues.apache.org/jira/browse/HIVE-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16957:
-------------------------------------------
    Description: 
The idea is to rely as much as possible on the logic in 
ColumnStatsSemanticAnalyzer as other operations do. In particular, they create 
a 'analyze table t compute statistics for columns', use 
ColumnStatsSemanticAnalyzer to parse it, and connect resulting plan to existing 
INSERT/INSERT OVERWRITE statement. The challenge for CTAS or CREATE 
MATERIALIZED VIEW is that the table object does not exist yet, hence we cannot 
rely fully on ColumnStatsSemanticAnalyzer.

Thus, we use same process, but ColumnStatsSemanticAnalyzer produces a statement 
for column stats collection that uses a table values clause instead of the 
original table reference:
{code}
select compute_stats(col1), compute_stats(col2), compute_stats(col3)
from table(values(cast(null as int), cast(null as int), cast(null as string))) 
as t(col1, col2, col3);
{code}

> Support CTAS for auto gather column stats
> -----------------------------------------
>
>                 Key: HIVE-16957
>                 URL: https://issues.apache.org/jira/browse/HIVE-16957
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>
> The idea is to rely as much as possible on the logic in 
> ColumnStatsSemanticAnalyzer as other operations do. In particular, they 
> create a 'analyze table t compute statistics for columns', use 
> ColumnStatsSemanticAnalyzer to parse it, and connect resulting plan to 
> existing INSERT/INSERT OVERWRITE statement. The challenge for CTAS or CREATE 
> MATERIALIZED VIEW is that the table object does not exist yet, hence we 
> cannot rely fully on ColumnStatsSemanticAnalyzer.
> Thus, we use same process, but ColumnStatsSemanticAnalyzer produces a 
> statement for column stats collection that uses a table values clause instead 
> of the original table reference:
> {code}
> select compute_stats(col1), compute_stats(col2), compute_stats(col3)
> from table(values(cast(null as int), cast(null as int), cast(null as 
> string))) as t(col1, col2, col3);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to