[jira] [Commented] (HIVE-20262) Implement stats annotation rule for the UDTFOperator

Ashutosh Chauhan (JIRA) Mon, 30 Jul 2018 15:14:53 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-20262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562590#comment-16562590
 ]


Ashutosh Chauhan commented on HIVE-20262:
-----------------------------------------

You should also update column statistics. Essentially scale col stats (like 
ndv, null counts etc) by same factor as row counts. That can be done in a 
follow-up. 
+1 

> Implement stats annotation rule for the UDTFOperator
> ----------------------------------------------------
>
>                 Key: HIVE-20262
>                 URL: https://issues.apache.org/jira/browse/HIVE-20262
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>            Reporter: George Pachitariu
>            Assignee: George Pachitariu
>            Priority: Minor
>         Attachments: HIVE-20262.1.patch, HIVE-20262.2.patch, HIVE-20262.patch
>
>
> User Defined Table Functions (UDTFs) change the number of rows of the output. 
> A common UDTF is the explode() method that creates a row for each element for 
> each array in the input column.
>  
> Right now, the number of output rows is equal to the number of input rows. 
> But if the average number of output rows is bigger than 1, the resulting 
> number of rows is underestimated in the execution plan.
>  
> Implement a rule that can have a factor X as a parameter and for each UDTF 
> function predict that:
>  
> {code:java}
> number of output rows = X * number of input rows{code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20262) Implement stats annotation rule for the UDTFOperator

Reply via email to