[ 
https://issues.apache.org/jira/browse/HIVE-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030342#comment-16030342
 ] 

Pengcheng Xiong commented on HIVE-16792:
----------------------------------------

yes, this can be an improvement of hive.optimize.filter.stats.reduction

> Estimate Rows When Joining BIGINT to INT Column
> -----------------------------------------------
>
>                 Key: HIVE-16792
>                 URL: https://issues.apache.org/jira/browse/HIVE-16792
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 2.1.1
>            Reporter: BELUGA BEHR
>            Priority: Minor
>
> {code:sql}
> create table test1
> (a int);
> create table test2
> (z bigint);
> INSERT INTO test1 VALUES (1);
> INSERT INTO test2 VALUES (2147483648);
> analyze table test1 compute statistics for columns;
> analyze table test2 compute statistics for columns;
> EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z;
> {code}
> {code}
> Explain
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> ""
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         t2 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         t2 
>           TableScan
>             alias: t2
>             filterExpr: z is not null (type: boolean)
>             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: COMPLETE
>             Filter Operator
>               predicate: z is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: COMPLETE
>               HashTable Sink Operator
>                 keys:
>                   0 UDFToLong(a) (type: bigint)
>                   1 z (type: bigint)
>   Stage: Stage-3
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: t1
>             filterExpr: UDFToLong(a) is not null (type: boolean)
>             Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column 
> stats: NONE
>             Filter Operator
>               predicate: UDFToLong(a) is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>               Map Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 UDFToLong(a) (type: bigint)
>                   1 z (type: bigint)
>                 outputColumnNames: _col0, _col4"
>                 Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: int), _col4 (type: bigint)"
>                   outputColumnNames: _col0, _col1"
>                   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> I would expect that perhaps Hive would be smart enough to know that this join 
> is not going to produce any rows because the MIN VALUE of table test2 is more 
> than INTEGER.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to