[ 
https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159618#comment-14159618
 ] 

Brock Noland commented on HIVE-7156:
------------------------------------

bq. This works exactly as expected when tez is not being used.

This is not exactly true..this change creates a non-optional dependency on Tez. 
The store behind alternative execution engines has always been that they are 
completely optional. This is codified by the fact that the tez deps are marked 
optional.

This code has to be modified to remove the required tez dependency.

> Group-By operator stat-annotation only uses distinct approx to generate 
> rollups
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-7156
>                 URL: https://issues.apache.org/jira/browse/HIVE-7156
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.14.0
>            Reporter: Gopal V
>            Assignee: Prasanth J
>            Priority: Blocker
>              Labels: TODOC14
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, 
> HIVE-7156.4.patch, HIVE-7156.5.patch, HIVE-7156.6.patch, HIVE-7156.7.patch, 
> HIVE-7156.8.patch, HIVE-7156.8.patch, HIVE-7156.9.patch, hive-debug.log.bz2
>
>
> The stats annotation for a group-by only annotates the reduce-side row-count 
> with the distinct values.
> The map-side gets the row-count as the rows output instead of distinct * 
> parallelism, while the reducer side gets the correct parallelism.
> {code}
> hive> explain select distinct L_SHIPDATE from lineitem;
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: lineitem
>                   Statistics: Num rows: 5999989709 Data size: 4745677733354 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: l_shipdate (type: string)
>                     outputColumnNames: l_shipdate
>                     Statistics: Num rows: 5999989709 Data size: 4745677733354 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Group By Operator
>                       keys: l_shipdate (type: string)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 5999989709 Data size: 
> 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: string)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: string)
>                         Statistics: Num rows: 5999989709 Data size: 
> 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Reducer 2 
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col0 (type: string)
>                   outputColumnNames: _col0
>                   Statistics: Num rows: 1955 Data size: 183770 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to