[jira] [Comment Edited] (HIVE-21660) Wrong result when union all and later view with explode is used

2020-03-09 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052726#comment-17052726
 ] 

Ganesha Shreedhara edited comment on HIVE-21660 at 3/9/20, 3:54 PM:


[~jcamachorodriguez] It looks like I do not have permission to create PR. I 
have created RB request ([https://reviews.apache.org/r/72203/]). Please review. 


was (Author: ganeshas):
[~jcamachorodriguez] It looks like I do not have permission to create PR. I 
have created RB req request ([https://reviews.apache.org/r/72203/]) . Please 
review. 

> Wrong result when union all and later view with explode is used
> ---
>
> Key: HIVE-21660
> URL: https://issues.apache.org/jira/browse/HIVE-21660
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.1.1
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21660.1.patch, HIVE-21660.patch
>
>
> There is a data loss when the data is inserted to a partitioned table using 
> union all and lateral view with explode. 
>  
> *Steps to reproduce:*
>  
> {code:java}
> create table t1 (id int, dt string);
> insert into t1 values (2, '2019-04-01');
> create table t2( id int, dates array);
> insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') 
> as dates;
> create table dst (id int) partitioned by (dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> insert overwrite table dst partition (dt)
> select t.id, t.dt from (
> select id, dt from t1
> union all
> select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) 
> t;
> select * from dst;
> {code}
>  
>  
> *Actual Result:*
> {code:java}
> +--+--+
> | 2| 2019-04-01   |
> +--+--+{code}
>  
> *Expected Result* (Run only the select part from the above insert query)*:* 
> {code:java}
> +---++
> | 2     | 2019-04-01 |
> | 1     | 2019-01-01 |
> | 1     | 2019-01-02 |
> | 1     | 2019-01-03 |
> +---++{code}
>  
> Data retrieved using union all and lateral view with explode from second 
> table is missing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21660) Wrong result when union all and later view with explode is used

2020-03-05 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052639#comment-17052639
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-21660 at 3/6/20, 1:22 AM:
-

[~ganeshas], I will review it. Can you rebase it (if needed) and create a PR? 
Thanks


was (Author: jcamachorodriguez):
[~ganeshas], I will review it. Can you create a PR? Thanks

> Wrong result when union all and later view with explode is used
> ---
>
> Key: HIVE-21660
> URL: https://issues.apache.org/jira/browse/HIVE-21660
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.1.1
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21660.1.patch, HIVE-21660.patch
>
>
> There is a data loss when the data is inserted to a partitioned table using 
> union all and lateral view with explode. 
>  
> *Steps to reproduce:*
>  
> {code:java}
> create table t1 (id int, dt string);
> insert into t1 values (2, '2019-04-01');
> create table t2( id int, dates array);
> insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') 
> as dates;
> create table dst (id int) partitioned by (dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> insert overwrite table dst partition (dt)
> select t.id, t.dt from (
> select id, dt from t1
> union all
> select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) 
> t;
> select * from dst;
> {code}
>  
>  
> *Actual Result:*
> {code:java}
> +--+--+
> | 2| 2019-04-01   |
> +--+--+{code}
>  
> *Expected Result* (Run only the select part from the above insert query)*:* 
> {code:java}
> +---++
> | 2     | 2019-04-01 |
> | 1     | 2019-01-01 |
> | 1     | 2019-01-02 |
> | 1     | 2019-01-03 |
> +---++{code}
>  
> Data retrieved using union all and lateral view with explode from second 
> table is missing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21660) Wrong result when union all and later view with explode is used

2019-05-20 Thread Ganesha Shreedhara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830329#comment-16830329
 ] 

Ganesha Shreedhara edited comment on HIVE-21660 at 5/20/19 9:47 AM:


When lateral view is used along with union all, the same object of 
FileSinkOperator type is getting visited twice in removeUnionOperators while 
looking for objects of FileSinkOperator type from all root operators (Ref: 
[source 
code|[https://github.com/apache/hive/blame/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java#L293]]).
   

 

This is because the operator tree for the second subquery having lateral view 
join is getting formed as below:

 
{code:java}
  TS17
   |
 LVF18
 /\
SEL19 SEL20
|   |
\ UDTF22
 \/ 
 LVJ21
   |
 SEL23
   | 
  FS25{code}
 

FS25 object is getting visited twice here.

It first sets the directory for the FileSinkOperator object as 
*tablePath+UNION_SUDBIR_PREFIX_2* (linked size is 2 because its the second 
subquery of union all query). 

When the same object is visited again, it resets the directory of that object 
to  *(tablePath+UNION_SUDBIR_PREFIX_2)+(**UNION_SUDBIR_PREFIX_1)*. **

So the data getting written in temp path formed using specPath 
(*tablePath+UNION_SUDBIR_PREFIX_2)* is not getting moved to the final path 
properly. 

 

This issue will be solved if we avoid setting the directory for the same object 
again.

 


was (Author: ganeshas):
When lateral view is used along with union all, the same object of 
FileSinkOperator type is getting visited twice in removeUnionOperators while 
looking for objects of FileSinkOperator type from all root operators (Ref: 
[source 
code|[https://github.com/apache/hive/blame/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java#L293]]).
 

It first sets the directory for the FileSinkOperator object as 
*tablePath+UNION_SUDBIR_PREFIX_2* (linked size is 2 because its the second 
subquery of union all query). 

When the same object is visited again, it resets the directory of that object 
to  *(tablePath+UNION_SUDBIR_PREFIX_2)+(**UNION_SUDBIR_PREFIX_1)*. **

So the data getting written in 
*tablePath+UNION_SUDBIR_PREFIX_2+**UNION_SUDBIR_PREFIX_1* is not getting moved 
to the final path. 

This issue will be solved if we avoid setting the directory for the same object 
again.  

 

The operator tree for the second subquery having lateral view join is getting 
formed as below:

 

 
{code:java}
  TS17
   |
 LVF18
 /\
SEL19 SEL20
|   |
\ UDTF22
 \/ 
 LVJ21
   |
 SEL23
   | 
  FS25{code}
 

FS25 object is getting visited twice here which is leading to this issue.

 

 

 

 

> Wrong result when union all and later view with explode is used
> ---
>
> Key: HIVE-21660
> URL: https://issues.apache.org/jira/browse/HIVE-21660
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.1.1
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Attachments: HIVE-21660.1.patch, HIVE-21660.patch
>
>
> There is a data loss when the data is inserted to a partitioned table using 
> union all and lateral view with explode. 
>  
> *Steps to reproduce:*
>  
> {code:java}
> create table t1 (id int, dt string);
> insert into t1 values (2, '2019-04-01');
> create table t2( id int, dates array);
> insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') 
> as dates;
> create table dst (id int) partitioned by (dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> insert overwrite table dst partition (dt)
> select t.id, t.dt from (
> select id, dt from t1
> union all
> select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) 
> t;
> select * from dst;
> {code}
>  
>  
> *Actual Result:*
> {code:java}
> +--+--+
> | 2| 2019-04-01   |
> +--+--+{code}
>  
> *Expected Result* (Run only the select part from the above insert query)*:* 
> {code:java}
> +---++
> | 2     | 2019-04-01 |
> | 1     | 2019-01-01 |
> | 1     | 2019-01-02 |
> | 1     | 2019-01-03 |
> +---++{code}
>  
> Data retrieved using union all and lateral view with explode from second 
> table is missing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)