[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

2015-10-19 Thread Chetna Chaudhari (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963337#comment-14963337
 ] 

Chetna Chaudhari commented on HIVE-11735:
-

[~ashutoshc]: Changes in column aliases and RR are needed, because we use RR 
for intermediate tables too. While debugging I have noticed that all predicates 
becomes column aliases for intermediate tables. So for my query it was creating 
only one column due to toLowerCase(). Please correct if I am wrong.

> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1
>Reporter: Chetna Chaudhari
>Assignee: Chetna Chaudhari
> Attachments: HIVE-11735.patch
>
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

2015-10-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962009#comment-14962009
 ] 

Ashutosh Chauhan commented on HIVE-11735:
-

I think problem here stems from 
{code}
aggregations.put(expressionTree.toStringTree().toLowerCase(), expressionTree);
{code}

I think for your particular query if you remove {{toLowerCase()}} would solve 
your problem. Do you really need other changes for column aliases and such in 
RR?

Intent for this map is to detect duplicate functions in aggregations, so that 
we are not computing them twice. However, this is blindly doing 
{{toLoweCase()}} on full expression Tree, ignoring the fact that there might be 
constant literals in there. There are two possible solutions here : 

* Eliminate this logic altogether from this phase. Don't bother about 
duplicates in phase 1 analysis. Instead write a rule either on Calcite operator 
tree or Hive operator tree which walks on expressions and detects duplicates 
and fixes up operator tree to refer to 1 expression tree.
* Write a utility function which takes expression tree as an argument and 
returns lower case version of its string tree, while leaving constant string 
literals in original case. Then use this string representation as a key in that 
map.

IMHO, Option 1 is a cleaner approach. However, that might be a big change 
touching various pieces in planning.
Option 2 is much more local and contained change, but kinda inelegant.

cc: [~jpullokkaran] if he has other ideas. 

> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1
>Reporter: Chetna Chaudhari
>Assignee: Chetna Chaudhari
> Attachments: HIVE-11735.patch
>
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

2015-10-17 Thread Chetna Chaudhari (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961904#comment-14961904
 ] 

Chetna Chaudhari commented on HIVE-11735:
-

[~ashutoshc]: This issue will occur in all queries wherever there are 
predicates based on case sensitive data. Any thoughts on whether I should 
proceed with fixing it for all. Because changing RowResolver class is causing 
test failures in other queries. Or its by design?

> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1
>Reporter: Chetna Chaudhari
>Assignee: Chetna Chaudhari
> Attachments: HIVE-11735.patch
>
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

2015-09-24 Thread Chetna Chaudhari (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906596#comment-14906596
 ] 

Chetna Chaudhari commented on HIVE-11735:
-

[~xuefuz] I tested it on master branch, but we use 1.0.0, 0.14.0 and 1.2.1 in 
our production where we have noticed this issue. I made the changes only for 
groupby case, but since its a change in RowResolver class, it failing for other 
cases. Will fix it for all cases, and will submit the patch again.

> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1
>Reporter: Chetna Chaudhari
>Assignee: Chetna Chaudhari
> Attachments: HIVE-11735.patch
>
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

2015-09-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905244#comment-14905244
 ] 

Hive QA commented on HIVE-11735:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754389/HIVE-11735.patch

{color:red}ERROR:{color} -1 due to 148 failed/errored test(s), 9578 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_case_sensitivity
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_window
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4_cb_delim
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_lazyserde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_inputddl5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_inputddl7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_array
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_casesensitive
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_onview
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_outer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_negative
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multigroupby_singlemr

[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

2015-09-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903659#comment-14903659
 ] 

Xuefu Zhang commented on HIVE-11735:


[~chetna], could you please update the "affected version"?

> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>Reporter: Chetna Chaudhari
>Assignee: Chetna Chaudhari
> Attachments: HIVE-11735.patch
>
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

2015-09-06 Thread Chetna Chaudhari (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732343#comment-14732343
 ] 

Chetna Chaudhari commented on HIVE-11735:
-

Issue is occurring while generating OPTree . 
Causes:
1) lowercase conversion of table and column aliases in RowResolver class.
2) Also in genSelectPlan, the aggregations are first converted to lowercase and 
then added to aggregations map.

Removing the toLowerCase() call from mentioned places resolved the issue for 
group by aggregations. 
Note this issue will happen in case of all aggregations, joins and select, the 
same logic is there for other operators too.
Would like to contribute the patch if everyone agrees on the change. 

> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>Reporter: Chetna Chaudhari
>Assignee: Chetna Chaudhari
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)