Alessandro Solimando created HIVE-25734:
-------------------------------------------

             Summary: Wrongly-typed constant in case expression leads to 
incorrect empty result
                 Key: HIVE-25734
                 URL: https://issues.apache.org/jira/browse/HIVE-25734
             Project: Hive
          Issue Type: Bug
          Components: CBO
    Affects Versions: 4.0.0
            Reporter: Alessandro Solimando


 

The type of constants in case expressions should be inferred, if possible, by 
the "surrounding" input reference columns, if any.

Consider the following table and query: 
{code:java}
create external table test_case (row_seq smallint, row_desc string) stored as 
parquet;
insert into test_case values (1, 'a');
insert into test_case values (2, 'aa');
insert into test_case values (6, 'aaaaaa');

with base_t as (select row_seq, row_desc,
  case row_seq
    when 1 then '34'
    when 6 then '35'
    when 2 then '36'
  end as zb from test_case where row_seq in (1,2,6))
select row_seq, row_desc, zb from base_t where zb <> '34';{code}
The aforementioned query fails by returning an empty results, while "1 a 34" is 
expected.

 

To understand the root cause, let's consider the debug input and output of some 
related CBO rules which are triggered during the evaluation of the query: 

 
{noformat}
--$0 is the column 'row_seq'
1. HiveReduceExpressionsRule
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 
1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), 
'36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
=($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
2. HivePointLookupOptimizerRule.RexTransformIntoInClause
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
=($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
3. HivePointLookupOptimizerRule.RexMergeInClause
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
Output: false{noformat}
In the first part, we can see that the constants are correctly typed as 
"SMALLINT" in the first part of the "AND" operand, while they are typed as 
"INTEGER" for the "CASE" expression, despite the input reference "$0" being 
available for inferring a more precise type.

This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" 
missing the commonality between the two "IN" expressions, whose intersection is 
considered empty, hence the empty result.

Providing a more refined type inference for "case" expressions should fix the 
issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to