Alessandro Solimando created HIVE-25734: -------------------------------------------
Summary: Wrongly-typed constant in case expression leads to incorrect empty result Key: HIVE-25734 URL: https://issues.apache.org/jira/browse/HIVE-25734 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0 Reporter: Alessandro Solimando The type of constants in case expressions should be inferred, if possible, by the "surrounding" input reference columns, if any. Consider the following table and query: {code:java} create external table test_case (row_seq smallint, row_desc string) stored as parquet; insert into test_case values (1, 'a'); insert into test_case values (2, 'aa'); insert into test_case values (6, 'aaaaaa'); with base_t as (select row_seq, row_desc, case row_seq when 1 then '34' when 6 then '35' when 2 then '36' end as zb from test_case where row_seq in (1,2,6)) select row_seq, row_desc, zb from base_t where zb <> '34';{code} The aforementioned query fails by returning an empty results, while "1 a 34" is expected. To understand the root cause, let's consider the debug input and output of some related CBO rules which are triggered during the evaluation of the query: {noformat} --$0 is the column 'row_seq' 1. HiveReduceExpressionsRule Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), '36':VARCHAR, null:VARCHAR), '34':CHAR(2))) Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER))) 2. HivePointLookupOptimizerRule.RexTransformIntoInClause Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER))) Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER))) 3. HivePointLookupOptimizerRule.RexMergeInClause Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER))) Output: false{noformat} In the first part, we can see that the constants are correctly typed as "SMALLINT" in the first part of the "AND" operand, while they are typed as "INTEGER" for the "CASE" expression, despite the input reference "$0" being available for inferring a more precise type. This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" missing the commonality between the two "IN" expressions, whose intersection is considered empty, hence the empty result. Providing a more refined type inference for "case" expressions should fix the issue. -- This message was sent by Atlassian Jira (v8.20.1#820001)