[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used
[ https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963337#comment-14963337 ] Chetna Chaudhari commented on HIVE-11735: - [~ashutoshc]: Changes in column aliases and RR are needed, because we use RR for intermediate tables too. While debugging I have noticed that all predicates becomes column aliases for intermediate tables. So for my query it was creating only one column due to toLowerCase(). Please correct if I am wrong. > Different results when multiple if() functions are used > > > Key: HIVE-11735 > URL: https://issues.apache.org/jira/browse/HIVE-11735 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1 >Reporter: Chetna Chaudhari >Assignee: Chetna Chaudhari > Attachments: HIVE-11735.patch > > > Hive if() udf is returns different results when string equality is used as > condition, with case change. > Observation: >1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are > treated as equal. >2) The rightmost udf result is pushed to predicates on left side. Leading > to same result for both the udfs. > How to reproduce the issue: > 1) CREATE TABLE `sample`( > `name` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1425075745'); > 2) insert into table sample values ('chetna'); > 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3)) from > sample; > This will give result : > 33 > Expected result: > 43 > 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3)) from > sample; > This will give result > 44 > Expected result: > 34 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used
[ https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962009#comment-14962009 ] Ashutosh Chauhan commented on HIVE-11735: - I think problem here stems from {code} aggregations.put(expressionTree.toStringTree().toLowerCase(), expressionTree); {code} I think for your particular query if you remove {{toLowerCase()}} would solve your problem. Do you really need other changes for column aliases and such in RR? Intent for this map is to detect duplicate functions in aggregations, so that we are not computing them twice. However, this is blindly doing {{toLoweCase()}} on full expression Tree, ignoring the fact that there might be constant literals in there. There are two possible solutions here : * Eliminate this logic altogether from this phase. Don't bother about duplicates in phase 1 analysis. Instead write a rule either on Calcite operator tree or Hive operator tree which walks on expressions and detects duplicates and fixes up operator tree to refer to 1 expression tree. * Write a utility function which takes expression tree as an argument and returns lower case version of its string tree, while leaving constant string literals in original case. Then use this string representation as a key in that map. IMHO, Option 1 is a cleaner approach. However, that might be a big change touching various pieces in planning. Option 2 is much more local and contained change, but kinda inelegant. cc: [~jpullokkaran] if he has other ideas. > Different results when multiple if() functions are used > > > Key: HIVE-11735 > URL: https://issues.apache.org/jira/browse/HIVE-11735 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1 >Reporter: Chetna Chaudhari >Assignee: Chetna Chaudhari > Attachments: HIVE-11735.patch > > > Hive if() udf is returns different results when string equality is used as > condition, with case change. > Observation: >1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are > treated as equal. >2) The rightmost udf result is pushed to predicates on left side. Leading > to same result for both the udfs. > How to reproduce the issue: > 1) CREATE TABLE `sample`( > `name` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1425075745'); > 2) insert into table sample values ('chetna'); > 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3)) from > sample; > This will give result : > 33 > Expected result: > 43 > 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3)) from > sample; > This will give result > 44 > Expected result: > 34 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used
[ https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961904#comment-14961904 ] Chetna Chaudhari commented on HIVE-11735: - [~ashutoshc]: This issue will occur in all queries wherever there are predicates based on case sensitive data. Any thoughts on whether I should proceed with fixing it for all. Because changing RowResolver class is causing test failures in other queries. Or its by design? > Different results when multiple if() functions are used > > > Key: HIVE-11735 > URL: https://issues.apache.org/jira/browse/HIVE-11735 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1 >Reporter: Chetna Chaudhari >Assignee: Chetna Chaudhari > Attachments: HIVE-11735.patch > > > Hive if() udf is returns different results when string equality is used as > condition, with case change. > Observation: >1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are > treated as equal. >2) The rightmost udf result is pushed to predicates on left side. Leading > to same result for both the udfs. > How to reproduce the issue: > 1) CREATE TABLE `sample`( > `name` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1425075745'); > 2) insert into table sample values ('chetna'); > 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3)) from > sample; > This will give result : > 33 > Expected result: > 43 > 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3)) from > sample; > This will give result > 44 > Expected result: > 34 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used
[ https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906596#comment-14906596 ] Chetna Chaudhari commented on HIVE-11735: - [~xuefuz] I tested it on master branch, but we use 1.0.0, 0.14.0 and 1.2.1 in our production where we have noticed this issue. I made the changes only for groupby case, but since its a change in RowResolver class, it failing for other cases. Will fix it for all cases, and will submit the patch again. > Different results when multiple if() functions are used > > > Key: HIVE-11735 > URL: https://issues.apache.org/jira/browse/HIVE-11735 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1 >Reporter: Chetna Chaudhari >Assignee: Chetna Chaudhari > Attachments: HIVE-11735.patch > > > Hive if() udf is returns different results when string equality is used as > condition, with case change. > Observation: >1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are > treated as equal. >2) The rightmost udf result is pushed to predicates on left side. Leading > to same result for both the udfs. > How to reproduce the issue: > 1) CREATE TABLE `sample`( > `name` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1425075745'); > 2) insert into table sample values ('chetna'); > 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3)) from > sample; > This will give result : > 33 > Expected result: > 43 > 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3)) from > sample; > This will give result > 44 > Expected result: > 34 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used
[ https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905244#comment-14905244 ] Hive QA commented on HIVE-11735: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12754389/HIVE-11735.patch {color:red}ERROR:{color} -1 due to 148 failed/errored test(s), 9578 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_case_sensitivity org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_window org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4_cb_delim org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_lazyserde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_inputddl5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_inputddl7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join42 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_array org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_casesensitive org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_onview org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_ppd org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_negative org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multigroupby_singlemr
[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used
[ https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903659#comment-14903659 ] Xuefu Zhang commented on HIVE-11735: [~chetna], could you please update the "affected version"? > Different results when multiple if() functions are used > > > Key: HIVE-11735 > URL: https://issues.apache.org/jira/browse/HIVE-11735 > Project: Hive > Issue Type: Bug >Reporter: Chetna Chaudhari >Assignee: Chetna Chaudhari > Attachments: HIVE-11735.patch > > > Hive if() udf is returns different results when string equality is used as > condition, with case change. > Observation: >1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are > treated as equal. >2) The rightmost udf result is pushed to predicates on left side. Leading > to same result for both the udfs. > How to reproduce the issue: > 1) CREATE TABLE `sample`( > `name` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1425075745'); > 2) insert into table sample values ('chetna'); > 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3)) from > sample; > This will give result : > 33 > Expected result: > 43 > 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3)) from > sample; > This will give result > 44 > Expected result: > 34 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used
[ https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732343#comment-14732343 ] Chetna Chaudhari commented on HIVE-11735: - Issue is occurring while generating OPTree . Causes: 1) lowercase conversion of table and column aliases in RowResolver class. 2) Also in genSelectPlan, the aggregations are first converted to lowercase and then added to aggregations map. Removing the toLowerCase() call from mentioned places resolved the issue for group by aggregations. Note this issue will happen in case of all aggregations, joins and select, the same logic is there for other operators too. Would like to contribute the patch if everyone agrees on the change. > Different results when multiple if() functions are used > > > Key: HIVE-11735 > URL: https://issues.apache.org/jira/browse/HIVE-11735 > Project: Hive > Issue Type: Bug >Reporter: Chetna Chaudhari >Assignee: Chetna Chaudhari > > Hive if() udf is returns different results when string equality is used as > condition, with case change. > Observation: >1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are > treated as equal. >2) The rightmost udf result is pushed to predicates on left side. Leading > to same result for both the udfs. > How to reproduce the issue: > 1) CREATE TABLE `sample`( > `name` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1425075745'); > 2) insert into table sample values ('chetna'); > 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3)) from > sample; > This will give result : > 33 > Expected result: > 43 > 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3)) from > sample; > This will give result > 44 > Expected result: > 34 -- This message was sent by Atlassian JIRA (v6.3.4#6332)