[ https://issues.apache.org/jira/browse/HIVE-20262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562590#comment-16562590 ]
Ashutosh Chauhan commented on HIVE-20262: ----------------------------------------- You should also update column statistics. Essentially scale col stats (like ndv, null counts etc) by same factor as row counts. That can be done in a follow-up. +1 > Implement stats annotation rule for the UDTFOperator > ---------------------------------------------------- > > Key: HIVE-20262 > URL: https://issues.apache.org/jira/browse/HIVE-20262 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer > Reporter: George Pachitariu > Assignee: George Pachitariu > Priority: Minor > Attachments: HIVE-20262.1.patch, HIVE-20262.2.patch, HIVE-20262.patch > > > User Defined Table Functions (UDTFs) change the number of rows of the output. > A common UDTF is the explode() method that creates a row for each element for > each array in the input column. > > Right now, the number of output rows is equal to the number of input rows. > But if the average number of output rows is bigger than 1, the resulting > number of rows is underestimated in the execution plan. > > Implement a rule that can have a factor X as a parameter and for each UDTF > function predict that: > > {code:java} > number of output rows = X * number of input rows{code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)