[jira] [Updated] (HIVE-27142) Map Join not working as expected when joining non-native tables with native tables

2023-03-15 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-27142:
-
Component/s: Statistics

>  Map Join not working as expected when joining non-native tables with native 
> tables
> ---
>
> Key: HIVE-27142
> URL: https://issues.apache.org/jira/browse/HIVE-27142
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *1. Issue :*
> When *_hive.auto.convert.join=true_* and if the underlying query is trying to 
> join a large non-native hive table with a small native hive table, The map 
> join is happening in the wrong side i.e on the map task which process the 
> small native hive table and it can lead to OOM when the non-native table is 
> really large and only few map tasks are spawned to scan the small native hive 
> tables.
>  
> *2. Why is this happening ?*
> This happens due to improper stats collection/computation of non native hive 
> tables. Since the non-native hive tables are actually stored in a different 
> location which Hive does not know of and only a temporary path which is 
> visible to Hive while creating a non native table does not store the actual 
> data, The stats collection logic tend to under estimate the data/rows and 
> hence causes the map join to happen in the wrong side.
>  
> *3. Potential Solutions*
>  3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative 
> impact of the query    if  the same query is trying to do multiple joins i.e 
> one join with non-native tables and other join where both the tables are 
> native.
>  3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> 
> command before joining native and non-native commands. The user may or may 
> not choose to do it.
>  3.3 Do not collect/estimate stats for non-native hive tables by default 
> (Preferred solution)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27142) Map Join not working as expected when joining non-native tables with native tables

2023-03-15 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-27142:
-
Description: 
*1. Issue :*

When *_hive.auto.convert.join=true_* and if the underlying query is trying to 
join a large non-native hive table with a small native hive table, The map join 
is happening in the wrong side i.e on the map task which process the small 
native hive table and it can lead to OOM when the non-native table is really 
large and only few map tasks are spawned to scan the small native hive tables.

 

*2. Why is this happening ?*

This happens due to improper stats collection/computation of non native hive 
tables. Since the non-native hive tables are actually stored in a different 
location which Hive does not know of and only a temporary path which is visible 
to Hive while creating a non native table does not store the actual data, The 
stats collection logic tend to under estimate the data/rows and hence causes 
the map join to happen in the wrong side.

 

*3. Potential Solutions*

 3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative 
impact of the query    if  the same query is trying to do multiple joins i.e 
one join with non-native tables and other join where both the tables are native.

 3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> command 
before joining native and non-native commands. The user may or may not choose 
to do it.

 3.3 Do not collect/estimate stats for non-native hive tables by default 
(Preferred solution)

  was:
*1. Issue :*

When *_hive.auto.convert.join=true_* and if the underlying query is trying to 
join a large non-native hive table with a small native hive table, The map join 
is happening in the wrong side i.e on the map task which process the small 
native hive table and it can lead to OOM when the non-native table is really 
large and only few map tasks are spawned to scan the small native hive tables.

 

*2. Why is this happening ?*

This happens due to improper stats collection/computation of non native hive 
tables. Since the non-native hive tables are actually stored in a different 
location which Hive does not know of and only a temporary path which is visible 
to Hive while creating a non native table does not store the actual data, The 
stats collection logic tend to under estimate the data/rows and hence causes 
the map join to happen in the wrong side.

 

*3. Potential Solutions*

 3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative 
impact of the query    if  the same query is trying to do multiple joins i.e 
one join with non-native tables and other join where both the tables are native.

 3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> command 
before joining native and non-native commands. The user may or may not choose 
to do it.

 3.3 Don't collect/estimate stats for non-native hive tables by default 
(Preferred solution)


>  Map Join not working as expected when joining non-native tables with native 
> tables
> ---
>
> Key: HIVE-27142
> URL: https://issues.apache.org/jira/browse/HIVE-27142
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *1. Issue :*
> When *_hive.auto.convert.join=true_* and if the underlying query is trying to 
> join a large non-native hive table with a small native hive table, The map 
> join is happening in the wrong side i.e on the map task which process the 
> small native hive table and it can lead to OOM when the non-native table is 
> really large and only few map tasks are spawned to scan the small native hive 
> tables.
>  
> *2. Why is this happening ?*
> This happens due to improper stats collection/computation of non native hive 
> tables. Since the non-native hive tables are actually stored in a different 
> location which Hive does not know of and only a temporary path which is 
> visible to Hive while creating a non native table does not store the actual 
> data, The stats collection logic tend to under estimate the data/rows and 
> hence causes the map join to happen in the wrong side.
>  
> *3. Potential Solutions*
>  3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative 
> impact of the query    if  the same query is trying to do multiple joins i.e 
> one join with non-native tables and other join where both the tables are 
> native.
>  3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> 
> command before joining native and non-native commands. The user may o

[jira] [Assigned] (HIVE-27142) Map Join not working as expected when joining non-native tables with native tables

2023-03-15 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-27142:



>  Map Join not working as expected when joining non-native tables with native 
> tables
> ---
>
> Key: HIVE-27142
> URL: https://issues.apache.org/jira/browse/HIVE-27142
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> *1. Issue :*
> When *_hive.auto.convert.join=true_* and if the underlying query is trying to 
> join a large non-native hive table with a small native hive table, The map 
> join is happening in the wrong side i.e on the map task which process the 
> small native hive table and it can lead to OOM when the non-native table is 
> really large and only few map tasks are spawned to scan the small native hive 
> tables.
>  
> *2. Why is this happening ?*
> This happens due to improper stats collection/computation of non native hive 
> tables. Since the non-native hive tables are actually stored in a different 
> location which Hive does not know of and only a temporary path which is 
> visible to Hive while creating a non native table does not store the actual 
> data, The stats collection logic tend to under estimate the data/rows and 
> hence causes the map join to happen in the wrong side.
>  
> *3. Potential Solutions*
>  3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative 
> impact of the query    if  the same query is trying to do multiple joins i.e 
> one join with non-native tables and other join where both the tables are 
> native.
>  3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> 
> command before joining native and non-native commands. The user may or may 
> not choose to do it.
>  3.3 Don't collect/estimate stats for non-native hive tables by default 
> (Preferred solution)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26787) Pushdown Timestamp data type to metastore via direct sql / JDO

2022-11-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26787:
-
Component/s: Standalone Metastore

> Pushdown Timestamp data type to metastore via direct sql / JDO
> --
>
> Key: HIVE-26787
> URL: https://issues.apache.org/jira/browse/HIVE-26787
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> make timestamp data type push down to hive metastore during partition 
> pruning. This is in similar lines with the jira: 
> https://issues.apache.org/jira/browse/HIVE-26778



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26787) Pushdown Timestamp data type to metastore via direct sql / JDO

2022-11-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-26787:



> Pushdown Timestamp data type to metastore via direct sql / JDO
> --
>
> Key: HIVE-26787
> URL: https://issues.apache.org/jira/browse/HIVE-26787
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> make timestamp data type push down to hive metastore during partition 
> pruning. This is in similar lines with the jira: 
> https://issues.apache.org/jira/browse/HIVE-26778



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Affects Version/s: 4.0.0

> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from part_time where date_c = '2000-03-01';
> {code}
>  
>  
> *Performance Improvement*
> In my testing setup of table having 10k partitions in the table. When we do a 
> select query on one of the partitions without the change it was 300 ms and 
> after the change it was 14 ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Component/s: Standalone Metastore

> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from part_time where date_c = '2000-03-01';
> {code}
>  
>  
> *Performance Improvement*
> In my testing setup of table having 10k partitions in the table. When we do a 
> select query on one of the partitions without the change it was 300 ms and 
> after the change it was 14 ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}
 

*Performance Improvement*

In my testing setup of table having *10k partitions* in the table. When we do a 
select query on one of the partitions without the change it was *300 ms* and 
after the change it was {*}14 ms{*}.

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}
 

 

*Performance Improvement*

In my testing setup of table having 10k partitions in the table. When we do a 
select query on one of the partitions without the change it was 300 ms and 
after the change it was 14 ms.


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error pa

[jira] [Commented] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-28 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640375#comment-17640375
 ] 

Syed Shameerur Rahman commented on HIVE-26778:
--

[~rajesh.balamohan] Yes, it is similar.

> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from part_time where date_c = '2000-03-01';
> {code}
>  
>  
> *Performance Improvement*
> In my testing setup of table having 10k partitions in the table. When we do a 
> select query on one of the partitions without the change it was 300 ms and 
> after the change it was 14 ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}
 

 

*Performance Improvement*

In my testing setup of table having 10k partitions in the table. When we do a 
select query on one of the partitions without the change it was 300 ms and 
after the change it was 14 ms.

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}
 

 

*Performance Improvement [TODO]*


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set h

[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}

*Performance Improvement [TODO]*

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part

[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}
 

 

*Performance Improvement [TODO]*

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}

*Performance Improvement [TODO]*


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000

[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql/JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-

[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false;) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql / JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='200

[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}

 

{color:#172b4d}When CBO is turned off (set hive.cbo.enable=false;) we don't see 
this error message.{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]
{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql / JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false;) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from par

[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*

The following query will generate "{color:#6a8759}Error parsing partition 
filter; lexer error" 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]
{color}
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';
{code}

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';

{code}


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql / JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]
> {color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from part_time where date_c = '2000-03-01';
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';

{code}

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';{code}


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql / JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from part_time where date_c = '2000-03-01';
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.
{code:java}
select * from test_table where date_col = '2022-01-01';
{code}
When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01' which the filter parser fails to recognize.

 

*Steps to reproduce*
{code:java}
create table part_time(a int) partitioned by(date_c date);
insert into part_time partition(date_c='2000-01-01') values (1);
insert into part_time partition(date_c='2000-02-01') values (1);
insert into part_time partition(date_c='2000-03-01') values (1); 

select * from part_time where date_c = '2000-03-01';{code}

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.


{code:java}
select * from test_table where date_col = '2022-01-01';
{code}

When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01'  which the filter parser fails to recognize.


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql / JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from part_time where date_c = '2000-03-01';{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26778:
-
Description: 
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.


{code:java}
select * from test_table where date_col = '2022-01-01';
{code}

When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01'  which the filter parser fails to recognize.

  was:
The original feature to push down date data type while doing partition pruning 
via direct sql / JDO was added as part of the jira : 
https://issues.apache.org/jira/browse/HIVE-5679

Since the behavior of Hive has changed with CBO, Now when CBO is turned on, The 
date data types are not pushed down to metastore due to CBO adding extra 
keyword 'DATE' with the original filter since the filter parser is not handled 
to parse this extra keyword it fails and hence the date data type is not pushed 
down to the metastore.


{code:java}
select * from test_table where date_col = '2022-01-01';
{code}

When CBO is turned on, The filter predicate generated is 
date_col=DATE'2022-01-01'


> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql / JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01'  which the filter parser fails to recognize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2022-11-25 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-26778:



> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql / JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26495) MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus

2022-08-29 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597154#comment-17597154
 ] 

Syed Shameerur Rahman commented on HIVE-26495:
--

LGTM +1

> MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus
> 
>
> Key: HIVE-26495
> URL: https://issues.apache.org/jira/browse/HIVE-26495
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With hive.metastore.fshandler.threads = 15, all 15 *MSCK-GetPaths-xx* are 
> slogging at following trace.
> {code:java}
> "MSCK-GetPaths-11" #12345 daemon prio=5 os_prio=0 tid= nid= waiting on 
> condition [0x7f9f099a6000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0003f92d1668> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>     at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> ...
> at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3230)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1995)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:550)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:543)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:525)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750){code}
> We should take advantage of non-block listStatusIterator instead of 
> listStatus which is a blocking call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26495) MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus

2022-08-24 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584536#comment-17584536
 ] 

Syed Shameerur Rahman edited comment on HIVE-26495 at 8/25/22 1:35 AM:
---

Do we see any performance improvement of msck comman by doing this?


was (Author: srahman):
Do we see any performance improvement of mock comman by doing this?

> MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus
> 
>
> Key: HIVE-26495
> URL: https://issues.apache.org/jira/browse/HIVE-26495
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With hive.metastore.fshandler.threads = 15, all 15 *MSCK-GetPaths-xx* are 
> slogging at following trace.
> {code:java}
> "MSCK-GetPaths-11" #12345 daemon prio=5 os_prio=0 tid= nid= waiting on 
> condition [0x7f9f099a6000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0003f92d1668> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>     at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> ...
> at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3230)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1995)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:550)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:543)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:525)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750){code}
> We should take advantage of non-block listStatusIterator instead of 
> listStatus which is a blocking call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26495) MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus

2022-08-24 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584536#comment-17584536
 ] 

Syed Shameerur Rahman commented on HIVE-26495:
--

Do we see any performance improvement of mock comman by doing this?

> MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus
> 
>
> Key: HIVE-26495
> URL: https://issues.apache.org/jira/browse/HIVE-26495
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With hive.metastore.fshandler.threads = 15, all 15 *MSCK-GetPaths-xx* are 
> slogging at following trace.
> {code:java}
> "MSCK-GetPaths-11" #12345 daemon prio=5 os_prio=0 tid= nid= waiting on 
> condition [0x7f9f099a6000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0003f92d1668> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>     at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> ...
> at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3230)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1995)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:550)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:543)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:525)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750){code}
> We should take advantage of non-block listStatusIterator instead of 
> listStatus which is a blocking call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26467) SessionState should be accessible inside ThreadPool

2022-08-12 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-26467:
-
Description: 
Currently SessionState.get() returns null if it is called inside a ThreadPool. 
If there is any custom third party component leverages SessionState.get() for 
some operations like getting the session state or session config inside a 
thread pool it will result in null since session state is thread local 
(https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L622)
 and ThreadLocal variable are not inheritable to child threads / thread pools.

So one solution is to make the thread local variable inheritable so the 
SessionState gets propagated to child threads.

  was:
Currently SessionState.get() returns null if it is called inside a ThreadPool. 
If there is any custom third party component leverages SessionState.get() for 
some operations like getting the session state or session config it will result 
in null since session state is thread local 
(https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L622)
 and ThreadLocal variable are not inheritable to child threads / thread pools.

So one solution is to make the thread local variable inheritable so the 
SessionState gets propagated to child threads.


> SessionState should be accessible inside ThreadPool
> ---
>
> Key: HIVE-26467
> URL: https://issues.apache.org/jira/browse/HIVE-26467
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently SessionState.get() returns null if it is called inside a 
> ThreadPool. If there is any custom third party component leverages 
> SessionState.get() for some operations like getting the session state or 
> session config inside a thread pool it will result in null since session 
> state is thread local 
> (https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L622)
>  and ThreadLocal variable are not inheritable to child threads / thread pools.
> So one solution is to make the thread local variable inheritable so the 
> SessionState gets propagated to child threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26467) SessionState should be accessible inside ThreadPool

2022-08-12 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-26467:



> SessionState should be accessible inside ThreadPool
> ---
>
> Key: HIVE-26467
> URL: https://issues.apache.org/jira/browse/HIVE-26467
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently SessionState.get() returns null if it is called inside a 
> ThreadPool. If there is any custom third party component leverages 
> SessionState.get() for some operations like getting the session state or 
> session config it will result in null since session state is thread local 
> (https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L622)
>  and ThreadLocal variable are not inheritable to child threads / thread pools.
> So one solution is to make the thread local variable inheritable so the 
> SessionState gets propagated to child threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-05-26 Thread Syed Shameerur Rahman (Jira)


[ https://issues.apache.org/jira/browse/HIVE-25907 ]


Syed Shameerur Rahman deleted comment on HIVE-25907:
--

was (Author: srahman):
[~kgyrtkirk] - Could you please review the PR?

Thanks

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-05-26 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542451#comment-17542451
 ] 

Syed Shameerur Rahman edited comment on HIVE-25980 at 5/26/22 11:37 AM:


[~ste...@apache.org] recursive listing can help but there are some 
disadvantages as well. The current implementation of recursive listing lists 
all the files including the leaf file / data files. If there are large number 
of data files in each partitions then recursive listing will be slower as 
compared to the parallel bfs which we have in place.

PS: In The current treewalk we list the files only till the partition level 
(i.e one level above the data files)


was (Author: srahman):
[~ste...@apache.org] recursive listing can help but there are some 
disadvantages as well. The current implementation of recursive listing lists 
all the files including the leaf file / data files. If there are large number 
of data files in each partitions then recursive listing will be slower as 
compared to the parallel bfs which we have in place.

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClie

[jira] [Commented] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-05-26 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542451#comment-17542451
 ] 

Syed Shameerur Rahman commented on HIVE-25980:
--

[~ste...@apache.org] recursive listing can help but there are some 
disadvantages as well. The current implementation of recursive listing lists 
all the files including the leaf file / data files. If there are large number 
of data files in each partitions then recursive listing will be slower as 
compared to the parallel bfs which we have in place.

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>   

[jira] [Commented] (HIVE-22753) Fix gradual mem leak: Operationlog related appenders should be cleared up on errors

2022-02-14 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491901#comment-17491901
 ] 

Syed Shameerur Rahman commented on HIVE-22753:
--

[~euigeun_chung] [~chengyong] Do you have a reproducible steps/case for the 
same?

> Fix gradual mem leak: Operationlog related appenders should be cleared up on 
> errors 
> 
>
> Key: HIVE-22753
> URL: https://issues.apache.org/jira/browse/HIVE-22753
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22753.1.patch, HIVE-22753.2.patch, 
> HIVE-22753.3.patch, HIVE-22753.4.patch, image-2020-01-21-11-14-37-911.png, 
> image-2020-01-21-11-17-59-279.png, image-2020-01-21-11-18-37-294.png
>
>
> In case of exception in SQLOperation, operational log does not get cleared 
> up. This causes gradual build up of HushableRandomAccessFileAppender causing 
> HS2 to OOM after some time.
> !image-2020-01-21-11-14-37-911.png|width=936,height=580!
>  
> Allocation tree
> !image-2020-01-21-11-18-37-294.png|width=929,height=389!
>  
> Prod instance mem
> !image-2020-01-21-11-17-59-279.png|width=671,height=201!
>  
> Each HushableRandomAccessFileAppender holds internal ref to 
> RandomAccessFileAppender which holds a 256 KB bytebuffer, causing the mem 
> leak.
> Related ticket: HIVE-18820



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25942) Upgrade commons-io to 2.8.0 due to CVE-2021-29425

2022-02-09 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-25942:



> Upgrade commons-io to 2.8.0 due to CVE-2021-29425
> -
>
> Key: HIVE-25942
> URL: https://issues.apache.org/jira/browse/HIVE-25942
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> Due to [CVE-2021-29425|https://nvd.nist.gov/vuln/detail/CVE-2021-29425] all 
> the commons-io versions below 2.7 are affected.
> Tez and Hadoop have upgraded commons-io to 2.8.0 in 
> [TEZ-4353|https://issues.apache.org/jira/browse/TEZ-4353] and 
> [HADOOP-17683|https://issues.apache.org/jira/browse/HADOOP-17683] 
> respectively and it will be good if Hive also follows the same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-01-30 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484495#comment-17484495
 ] 

Syed Shameerur Rahman commented on HIVE-25907:
--

[~kgyrtkirk] - Could you please review the PR?

Thanks

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-01-27 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-25907:
-
Affects Version/s: 4.0.0

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-01-27 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-25907:



> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

2021-11-15 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444267#comment-17444267
 ] 

Syed Shameerur Rahman commented on HIVE-25680:
--

[~kgyrtkirk] Could you please review the changes?


> Authorize #get_table_meta HiveMetastore Server API to use any of the 
> HiveMetastore Authorization model
> --
>
> Key: HIVE-25680
> URL: https://issues.apache.org/jira/browse/HIVE-25680
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2021-11-08 at 2.39.30 PM.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When Apache Hue or any other application which uses #get_table_meta API is 
> not gated to use any of the authorization model which HiveMetastore provides.
> For more information on Storage based Authorization Model : 
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Authorization
> You can easily reproduce this with Apache Hive + Apache Hue
> {code:java}
>   
> hive.security.metastore.authorization.manager
> 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
>   
> 
> hive.security.metastore.authenticator.manager
> 
> org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
>   
> 
> hive.metastore.pre.event.listeners
> 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
>   
> {code}
> {code:java}
> #!/bin/bash
> set -x
> hdfs dfs -mkdir /datasets
> hdfs dfs -mkdir /datasets/database1
> hdfs dfs -mkdir /datasets/database1/table1
> echo "stefano,1992" | hdfs dfs -put - /datasets/database1/table1/file1.csv
> hdfs dfs -chmod -R 700 /datasets/database1
> sudo tee -a setup.hql > /dev/null < CREATE DATABASE IF NOT EXISTS database1 LOCATION "/datasets/database1";
> CREATE EXTERNAL TABLE IF NOT EXISTS database1.table1 (
>   name string, 
>   year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LOCATION
>   '/datasets/database1/table1';
> EOT
> hive -f setup.hql
> {code}
> 1. Login to Hue => create the first user called "admin" and provide a 
> password Access the Hive Editor
> 2. On the SQL section on the left under Databases you should see default and 
> database1 listed. Click on database1
> 3. As you can see a table called table1 is listed => this should not be 
> possible as our admin user has no HDFS grants on /datasets/database1
> 4. run from the Hive editor the following query SHOW TABLES; The output shows 
> a Permission denied error => this is the expected behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

2021-11-15 Thread Syed Shameerur Rahman (Jira)


[ https://issues.apache.org/jira/browse/HIVE-25680 ]


Syed Shameerur Rahman deleted comment on HIVE-25680:
--

was (Author: srahman):
[~kgyrtkirk] Could you please review the changes?

> Authorize #get_table_meta HiveMetastore Server API to use any of the 
> HiveMetastore Authorization model
> --
>
> Key: HIVE-25680
> URL: https://issues.apache.org/jira/browse/HIVE-25680
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2021-11-08 at 2.39.30 PM.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When Apache Hue or any other application which uses #get_table_meta API is 
> not gated to use any of the authorization model which HiveMetastore provides.
> For more information on Storage based Authorization Model : 
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Authorization
> You can easily reproduce this with Apache Hive + Apache Hue
> {code:java}
>   
> hive.security.metastore.authorization.manager
> 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
>   
> 
> hive.security.metastore.authenticator.manager
> 
> org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
>   
> 
> hive.metastore.pre.event.listeners
> 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
>   
> {code}
> {code:java}
> #!/bin/bash
> set -x
> hdfs dfs -mkdir /datasets
> hdfs dfs -mkdir /datasets/database1
> hdfs dfs -mkdir /datasets/database1/table1
> echo "stefano,1992" | hdfs dfs -put - /datasets/database1/table1/file1.csv
> hdfs dfs -chmod -R 700 /datasets/database1
> sudo tee -a setup.hql > /dev/null < CREATE DATABASE IF NOT EXISTS database1 LOCATION "/datasets/database1";
> CREATE EXTERNAL TABLE IF NOT EXISTS database1.table1 (
>   name string, 
>   year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LOCATION
>   '/datasets/database1/table1';
> EOT
> hive -f setup.hql
> {code}
> 1. Login to Hue => create the first user called "admin" and provide a 
> password Access the Hive Editor
> 2. On the SQL section on the left under Databases you should see default and 
> database1 listed. Click on database1
> 3. As you can see a table called table1 is listed => this should not be 
> possible as our admin user has no HDFS grants on /datasets/database1
> 4. run from the Hive editor the following query SHOW TABLES; The output shows 
> a Permission denied error => this is the expected behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

2021-11-08 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440904#comment-17440904
 ] 

Syed Shameerur Rahman commented on HIVE-25680:
--

[~kgyrtkirk] Could you please review the changes?

> Authorize #get_table_meta HiveMetastore Server API to use any of the 
> HiveMetastore Authorization model
> --
>
> Key: HIVE-25680
> URL: https://issues.apache.org/jira/browse/HIVE-25680
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2021-11-08 at 2.39.30 PM.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When Apache Hue or any other application which uses #get_table_meta API is 
> not gated to use any of the authorization model which HiveMetastore provides.
> For more information on Storage based Authorization Model : 
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Authorization
> You can easily reproduce this with Apache Hive + Apache Hue
> {code:java}
>   
> hive.security.metastore.authorization.manager
> 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
>   
> 
> hive.security.metastore.authenticator.manager
> 
> org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
>   
> 
> hive.metastore.pre.event.listeners
> 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
>   
> {code}
> {code:java}
> #!/bin/bash
> set -x
> hdfs dfs -mkdir /datasets
> hdfs dfs -mkdir /datasets/database1
> hdfs dfs -mkdir /datasets/database1/table1
> echo "stefano,1992" | hdfs dfs -put - /datasets/database1/table1/file1.csv
> hdfs dfs -chmod -R 700 /datasets/database1
> sudo tee -a setup.hql > /dev/null < CREATE DATABASE IF NOT EXISTS database1 LOCATION "/datasets/database1";
> CREATE EXTERNAL TABLE IF NOT EXISTS database1.table1 (
>   name string, 
>   year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LOCATION
>   '/datasets/database1/table1';
> EOT
> hive -f setup.hql
> {code}
> 1. Login to Hue => create the first user called "admin" and provide a 
> password Access the Hive Editor
> 2. On the SQL section on the left under Databases you should see default and 
> database1 listed. Click on database1
> 3. As you can see a table called table1 is listed => this should not be 
> possible as our admin user has no HDFS grants on /datasets/database1
> 4. run from the Hive editor the following query SHOW TABLES; The output shows 
> a Permission denied error => this is the expected behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

2021-11-08 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-25680:
-
Component/s: Standalone Metastore

> Authorize #get_table_meta HiveMetastore Server API to use any of the 
> HiveMetastore Authorization model
> --
>
> Key: HIVE-25680
> URL: https://issues.apache.org/jira/browse/HIVE-25680
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: Screenshot 2021-11-08 at 2.39.30 PM.png
>
>
> When Apache Hue or any other application which uses #get_table_meta API is 
> not gated to use any of the authorization model which HiveMetastore provides.
> For more information on Storage based Authorization Model : 
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Authorization
> You can easily reproduce this with Apache Hive + Apache Hue
> {code:java}
>   
> hive.security.metastore.authorization.manager
> 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
>   
> 
> hive.security.metastore.authenticator.manager
> 
> org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
>   
> 
> hive.metastore.pre.event.listeners
> 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
>   
> {code}
> {code:java}
> #!/bin/bash
> set -x
> hdfs dfs -mkdir /datasets
> hdfs dfs -mkdir /datasets/database1
> hdfs dfs -mkdir /datasets/database1/table1
> echo "stefano,1992" | hdfs dfs -put - /datasets/database1/table1/file1.csv
> hdfs dfs -chmod -R 700 /datasets/database1
> sudo tee -a setup.hql > /dev/null < CREATE DATABASE IF NOT EXISTS database1 LOCATION "/datasets/database1";
> CREATE EXTERNAL TABLE IF NOT EXISTS database1.table1 (
>   name string, 
>   year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LOCATION
>   '/datasets/database1/table1';
> EOT
> hive -f setup.hql
> {code}
> 1. Login to Hue => create the first user called "admin" and provide a 
> password Access the Hive Editor
> 2. On the SQL section on the left under Databases you should see default and 
> database1 listed. Click on database1
> 3. As you can see a table called table1 is listed => this should not be 
> possible as our admin user has no HDFS grants on /datasets/database1
> 4. run from the Hive editor the following query SHOW TABLES; The output shows 
> a Permission denied error => this is the expected behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

2021-11-08 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-25680:



> Authorize #get_table_meta HiveMetastore Server API to use any of the 
> HiveMetastore Authorization model
> --
>
> Key: HIVE-25680
> URL: https://issues.apache.org/jira/browse/HIVE-25680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: Screenshot 2021-11-08 at 2.39.30 PM.png
>
>
> When Apache Hue or any other application which uses #get_table_meta API is 
> not gated to use any of the authorization model which HiveMetastore provides.
> For more information on Storage based Authorization Model : 
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Authorization
> You can easily reproduce this with Apache Hive + Apache Hue
> {code:java}
>   
> hive.security.metastore.authorization.manager
> 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
>   
> 
> hive.security.metastore.authenticator.manager
> 
> org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
>   
> 
> hive.metastore.pre.event.listeners
> 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
>   
> {code}
> {code:java}
> #!/bin/bash
> set -x
> hdfs dfs -mkdir /datasets
> hdfs dfs -mkdir /datasets/database1
> hdfs dfs -mkdir /datasets/database1/table1
> echo "stefano,1992" | hdfs dfs -put - /datasets/database1/table1/file1.csv
> hdfs dfs -chmod -R 700 /datasets/database1
> sudo tee -a setup.hql > /dev/null < CREATE DATABASE IF NOT EXISTS database1 LOCATION "/datasets/database1";
> CREATE EXTERNAL TABLE IF NOT EXISTS database1.table1 (
>   name string, 
>   year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LOCATION
>   '/datasets/database1/table1';
> EOT
> hive -f setup.hql
> {code}
> 1. Login to Hue => create the first user called "admin" and provide a 
> password Access the Hive Editor
> 2. On the SQL section on the left under Databases you should see default and 
> database1 listed. Click on database1
> 3. As you can see a table called table1 is listed => this should not be 
> possible as our admin user has no HDFS grants on /datasets/database1
> 4. run from the Hive editor the following query SHOW TABLES; The output shows 
> a Permission denied error => this is the expected behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-09-29 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422080#comment-17422080
 ] 

Syed Shameerur Rahman commented on HIVE-25443:
--

[~kgyrtkirk] Could you please review the changes?
Thanks

> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397855#comment-17397855
 ] 

Syed Shameerur Rahman edited comment on HIVE-25443 at 8/12/21, 6:39 AM:


Update: The failing qtest looks flaky since it passed on my local machine


{code:java}
mvn clean test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=jdbc_split_filter.q

[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.511 s 
- in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  49.319 s

{code}



was (Author: srahman):
Update: The failing qtest looks flaky since it passed on my local machine


{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.511 s 
- in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  49.319 s

{code}


> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397855#comment-17397855
 ] 

Syed Shameerur Rahman commented on HIVE-25443:
--

Update: The failing qtest looks flaky since it passed on my local machine


{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.511 s 
- in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  49.319 s

{code}


> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25443 started by Syed Shameerur Rahman.

> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397249#comment-17397249
 ] 

Syed Shameerur Rahman commented on HIVE-25443:
--

[~kgyrtkirk] [~pgaref] Could you please review the pull request ?
Thanks

> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-25443:



> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2, 3.1.1, 3.0.0, 3.1.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-04 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278754#comment-17278754
 ] 

Syed Shameerur Rahman commented on HIVE-24584:
--

[~amagyar] Thanks for coming up with the Test Case,

The change looks good to me! (+1).
Looping in [~kgyrtkirk] for an extra confirmation!

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: msckrepro.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-01 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276212#comment-17276212
 ] 

Syed Shameerur Rahman commented on HIVE-24584:
--

Hi, [~amagyar]

Unfortunately i am not able to reproduce this. Here is my setup

Hive Verison - 3.1.2 , I tried this in non-embedded mode i.e HMS (HMS running 
in different JVM from HS2) with SQL DB hosted locally (Running on the same node 
where my HS2 and HMS server are running). 

Am i missing anything?


> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273303#comment-17273303
 ] 

Syed Shameerur Rahman commented on HIVE-24584:
--

[~amagyar] Do you have any reproducible use case? Because i am not able to 
reproduce this on Hive 3.1.2 or Is this the issue of current master or 
different hive version?

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24690) GlobalLimitOptimizer Fails To Identify Some Queries With LIMIT Operator

2021-01-27 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-24690:
-
Description: 
As per 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L88]
 queries like
{code:java}
CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT ..
INSERT OVERWRITE TABLE ... SELECT col1, hash(col2), split(col1) FROM ... 
LIMIT...
{code}
falls under the category of qualified list, But after HIVE-9444 it is not.

On investigating this issue, It is found that for
{code:java}
CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT 
{code}
query the operator tree looks like *TS -> SEL -> LIM -> RS -> SEL -> LIM -> FS*

Since only only LIMIT operator is allowed as per 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L196
 , The *GlobalLimitOptimizer* fails to identify such queries.

*Steps To Reproduce*

{code:java}
set hive.limit.optimize.enable=true;
create table t1 (a int);
create table t2 as select * from t1 LIMIT 10;
{code}




  was:
As per 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L88]
 queries like
{code:java}
CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT ..
INSERT OVERWRITE TABLE ... SELECT col1, hash(col2), split(col1) FROM ... 
LIMIT...
{code}
falls under the category of qualified list, But after HIVE-9444 it is not.

On investigating this issue, It is found that for
{code:java}
CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT 
{code}
query the operator tree looks like *TS -> SEL -> LIM -> RS -> SEL -> LIM -> FS*

Since only only LIMIT operator is allowed as per 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L196
 , The *GlobalLimitOptimizer* fails to identify such queries.

*Steps To Reproduce*

{code:java}
set hive.limit.optimize.enable=true;
create table t1 (a int);
create table t2 select * from t1 LIMIT 10;
{code}





> GlobalLimitOptimizer Fails To Identify Some Queries With LIMIT Operator
> ---
>
> Key: HIVE-24690
> URL: https://issues.apache.org/jira/browse/HIVE-24690
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.1.0, 2.1.0, 3.1.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> As per 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L88]
>  queries like
> {code:java}
> CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT ..
> INSERT OVERWRITE TABLE ... SELECT col1, hash(col2), split(col1) FROM ... 
> LIMIT...
> {code}
> falls under the category of qualified list, But after HIVE-9444 it is not.
> On investigating this issue, It is found that for
> {code:java}
> CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT 
> {code}
> query the operator tree looks like *TS -> SEL -> LIM -> RS -> SEL -> LIM -> 
> FS*
> Since only only LIMIT operator is allowed as per 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L196
>  , The *GlobalLimitOptimizer* fails to identify such queries.
> *Steps To Reproduce*
> {code:java}
> set hive.limit.optimize.enable=true;
> create table t1 (a int);
> create table t2 as select * from t1 LIMIT 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24690) GlobalLimitOptimizer Fails To Identify Some Queries With LIMIT Operator

2021-01-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272804#comment-17272804
 ] 

Syed Shameerur Rahman commented on HIVE-24690:
--

CC: [~jcamachorodriguez]

> GlobalLimitOptimizer Fails To Identify Some Queries With LIMIT Operator
> ---
>
> Key: HIVE-24690
> URL: https://issues.apache.org/jira/browse/HIVE-24690
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.1.0, 2.1.0, 3.1.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> As per 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L88]
>  queries like
> {code:java}
> CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT ..
> INSERT OVERWRITE TABLE ... SELECT col1, hash(col2), split(col1) FROM ... 
> LIMIT...
> {code}
> falls under the category of qualified list, But after HIVE-9444 it is not.
> On investigating this issue, It is found that for
> {code:java}
> CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT 
> {code}
> query the operator tree looks like *TS -> SEL -> LIM -> RS -> SEL -> LIM -> 
> FS*
> Since only only LIMIT operator is allowed as per 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L196
>  , The *GlobalLimitOptimizer* fails to identify such queries.
> *Steps To Reproduce*
> {code:java}
> set hive.limit.optimize.enable=true;
> create table t1 (a int);
> create table t2 select * from t1 LIMIT 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24690) GlobalLimitOptimizer Fails To Identify Some Queries With LIMIT Operator

2021-01-27 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-24690:



> GlobalLimitOptimizer Fails To Identify Some Queries With LIMIT Operator
> ---
>
> Key: HIVE-24690
> URL: https://issues.apache.org/jira/browse/HIVE-24690
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.0, 2.1.0, 1.1.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> As per 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L88]
>  queries like
> {code:java}
> CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT ..
> INSERT OVERWRITE TABLE ... SELECT col1, hash(col2), split(col1) FROM ... 
> LIMIT...
> {code}
> falls under the category of qualified list, But after HIVE-9444 it is not.
> On investigating this issue, It is found that for
> {code:java}
> CREATE TABLE ... AS SELECT col1, col2 FROM tbl LIMIT 
> {code}
> query the operator tree looks like *TS -> SEL -> LIM -> RS -> SEL -> LIM -> 
> FS*
> Since only only LIMIT operator is allowed as per 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GlobalLimitOptimizer.java#L196
>  , The *GlobalLimitOptimizer* fails to identify such queries.
> *Steps To Reproduce*
> {code:java}
> set hive.limit.optimize.enable=true;
> create table t1 (a int);
> create table t2 select * from t1 LIMIT 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2021-01-22 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~abstractdog] Could you please review the PR?)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2021-01-21 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269288#comment-17269288
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~kgyrtkirk] I have resolved the merge conflict and now all tests are passing!

Are we good to merge this?

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2, 3.0.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.e

[jira] [Comment Edited] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-13 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263983#comment-17263983
 ] 

Syed Shameerur Rahman edited comment on HIVE-24584 at 1/13/21, 8:43 AM:


[~amagyar] - As per my understanding all msck command flow are defaulted to use 
MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given.
So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i 
missing anything? Because the above mentioned problem doesn't arise if you use 
the default path.


{code:java}
public static Configuration getMsckConf(Configuration conf) {
// the only reason we are using new conf here is to override 
EXPRESSION_PROXY_CLASS
Configuration metastoreConf = MetastoreConf.newMetastoreConf(new 
Configuration(conf));

metastoreConf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),

metastoreConf.get(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),
MsckPartitionExpressionProxy.class.getCanonicalName()));
return metastoreConf;
  }
{code}



was (Author: srahman):
[~amagyar] - As per my understanding all msck command flow are defaulted to use 
MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given.
So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i 
missing anything? Because the above mentioned problem doesn't arise if you use 
the default path.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-13 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263983#comment-17263983
 ] 

Syed Shameerur Rahman commented on HIVE-24584:
--

[~amagyar] - As per my understanding all msck command flow are defaulted to use 
MsckPartitionExpressionProxy unless EXPRESSION_PROXY_CLASS is explicitly given.
So is there any reason for explicitly setting EXPRESSION_PROXY_CLASS or am i 
missing anything? Because the above mentioned problem doesn't arise if you use 
the default path.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-06 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Affects Version/s: 4.0.0

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}



--
This message was sent by A

[jira] [Updated] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-06 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Affects Version/s: 3.0.0
   3.1.1
   3.1.2

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0, 2.3.1, 2.3.2, 3.1.1, 3.1.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ..

[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~kgyrtkirk] As per your comments, I have changed the implementation. 
Please review the PR.
Thanks.)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can 

[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~kgyrtkirk] Does the new approach makes sense?)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExp

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-10-05 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208490#comment-17208490
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~kgyrtkirk] Could you please review the PR?

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for 

[jira] [Comment Edited] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208486#comment-17208486
 ] 

Syed Shameerur Rahman edited comment on HIVE-18284 at 10/6/20, 5:15 AM:


[~kgyrtkirk] [~jcamachorodriguez] [~ashutoshc] ping for review request!


was (Author: srahman):
[~kgyrtkirk] [~jcamachorodriguez] ping for review request!

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.p

[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208486#comment-17208486
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~kgyrtkirk] [~jcamachorodriguez] ping for review request!

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>

[jira] [Issue Comment Deleted] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Comment: was deleted

(was: [~jcamachorodriguez] Could you please review the PR?)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}




[jira] [Issue Comment Deleted] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Comment: was deleted

(was: [~kgyrtkirk]  I have addressed your comments. Please take a look!)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more

[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-29 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204472#comment-17204472
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

[~abstractdog] Could you please review the PR?

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-29 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~gopalv] [~prasanth_j] Gentle reminder :)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-29 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~gopalv] [~rajesh.balamohan] ping for review request!)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-09-29 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204451#comment-17204451
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~jcamachorodriguez] Could you please review the PR?

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>  

[jira] [Issue Comment Deleted] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-09-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Comment: was deleted

(was: [~kgyrtkirk] [~jcamachorodriguez] Could you Please review the PR?)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more

[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-09-28 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203645#comment-17203645
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~kgyrtkirk]  I have addressed your comments. Please take a look!

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java

[jira] [Issue Comment Deleted] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-09-28 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Comment: was deleted

(was: *PR:* https://github.com/apache/hive/pull/1400
[~jcamachorodriguez] can you please review?

Thanks!)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSo

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-22 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200544#comment-17200544
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~kgyrtkirk] Does the new approach makes sense?

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-11 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194261#comment-17194261
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~kgyrtkirk] As per your comments, I have changed the implementation. Please 
review the PR.
Thanks.

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metats

[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-11 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~kgyrtkirk] [~jcamachorodriguez] Ping for review request!)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class Msc

[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-11 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~kgyrtkirk] I have raised a PR following the approach 2, Could you 
please review?)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remo

[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-11 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~kgyrtkirk] Could you please review the PR?)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExpr

[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-11 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~jcamachorodriguez] [~kgyrtkirk] Could you Please review the PR?
Thanks..)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the ne

[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-09-10 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193518#comment-17193518
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~kgyrtkirk] [~jcamachorodriguez] Could you Please review the PR?

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java

[jira] [Issue Comment Deleted] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-09-10 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Comment: was deleted

(was: [~jcamachorodriguez] Could you please review the PR?)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}




[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-08 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192169#comment-17192169
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~jcamachorodriguez] [~kgyrtkirk] Could you Please review the PR?
Thanks..

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can

[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-01 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188409#comment-17188409
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

[~gopalv] [~rajesh.balamohan] ping for review request!

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-09-01 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~szita] [~rajesh.balamohan] Ping for review request!)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-24 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183160#comment-17183160
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~jcamachorodriguez] Could you please review the PR?

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ..

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-08-24 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183159#comment-17183159
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~kgyrtkirk] Could you please review the PR?

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need fo

[jira] [Updated] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Environment: (was: EMR)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005

[jira] [Updated] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Description: 
A Null Pointer Exception occurs when inserting data with 'distribute by' 
clause. The following snippet query reproduces this issue:
*(non-vectorized , non-llap mode)*

{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

I could run the insert query without the error if I remove Distribute By  or 
use Cluster By clause.
It seems that the issue happens because Distribute By does not guarantee 
clustering or sorting properties on the distributed keys.

FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
previous fsp which might be re-used when we use Distribute By.
https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972

The following stack trace is logged.

{code:java}
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
... 17 more
{code}



  was:
A Null Pointer Exception occurs when inserting data with 'distribute by' 
clause. The following snippet query reproduces this issue:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

I could run the insert query without the error if I remove Distribute By  or 
use Cluster By clause.
It seems that the issue happens because Distribute By does not guarantee 
clustering or sorting properties on the d

[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-13 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176832#comment-17176832
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

*PR:* https://github.com/apache/hive/pull/1400
[~jcamachorodriguez] can you please review?

Thanks!

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
> Environment: EMR
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}



--
This message was sent by Atlassian

[jira] [Updated] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Description: 
A Null Pointer Exception occurs when inserting data with 'distribute by' 
clause. The following snippet query reproduces this issue:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

I could run the insert query without the error if I remove Distribute By  or 
use Cluster By clause.
It seems that the issue happens because Distribute By does not guarantee 
clustering or sorting properties on the distributed keys.

FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
previous fsp which might be re-used when we use Distribute By.
https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972

The following stack trace is logged.

{code:java}
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
... 17 more
{code}



  was:
A Null Pointer Exception occurs when inserting data with 'distribute by' 
clause. The following snippet query reproduces this issue:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

I could run the insert query without the error if I remove Distribute By  or 
use Cluster By clause.
It seems that the issue happens because Distribute By does not guarantee 
clustering or sorting properties on the distributed keys.

FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
previous fsp which might be re-

[jira] [Updated] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Summary: NPE when inserting data with 'distribute by' clause with dynpart 
sort optimization  (was: NPE when inserting data with 'distribute by' clause)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
> Environment: EMR
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}



--
This message was sent by Atlassian 

[jira] [Assigned] (HIVE-18284) NPE when inserting data with 'distribute by' clause

2020-08-13 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-18284:


Assignee: Syed Shameerur Rahman  (was: Lynch Lee)

> NPE when inserting data with 'distribute by' clause
> ---
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
> Environment: EMR
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause

2020-08-07 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173124#comment-17173124
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

Steps to reproduce:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

when * hive.optimize.sort.dynamic.partition=true* we expect the keys to be 
sorted in the reducer side so that reducers can keep only one record writer 
open at any time thereby reducing the memory pressure on the reducers 
(HIVE-6455) , But from the query execution it looks like the property of 
*distribute by clause* (does not guarantee clustering or sorting properties on 
the distributed keys) takes precedence and hence the keys at the reducer side 
are not in sorted order and fails with null pointer exception.

[~prasanth_j] [~jcamachorodriguez] [~vikram.dixit] any thoughts on this?
*Note*: This issue is not seen when vectorization is enabled


> NPE when inserting data with 'distribute by' clause
> ---
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
> Environment: EMR
>Reporter: Aki Tanaka
>Assignee: Lynch Lee
>Priority: Major
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>

[jira] [Comment Edited] (HIVE-18284) NPE when inserting data with 'distribute by' clause

2020-08-07 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173124#comment-17173124
 ] 

Syed Shameerur Rahman edited comment on HIVE-18284 at 8/7/20, 1:01 PM:
---

Steps to reproduce:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

when *hive.optimize.sort.dynamic.partition=true* we expect the keys to be 
sorted in the reducer side so that reducers can keep only one record writer 
open at any time thereby reducing the memory pressure on the reducers 
(HIVE-6455) , But from the query execution it looks like the property of 
*distribute by clause* (does not guarantee clustering or sorting properties on 
the distributed keys) takes precedence and hence the keys at the reducer side 
are not in sorted order and fails with null pointer exception.

[~prasanth_j] [~jcamachorodriguez] [~vikram.dixit] any thoughts on this?
*Note*: This issue is not seen when vectorization is enabled



was (Author: srahman):
Steps to reproduce:


{code:java}
create table table1 (col1 string, datekey int);
insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
create table table2 (col1 string) partitioned by (datekey int);

set hive.vectorized.execution.enabled=false;
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table2
PARTITION(datekey)
select col1,
datekey
from table1
distribute by datekey ;
{code}

when * hive.optimize.sort.dynamic.partition=true* we expect the keys to be 
sorted in the reducer side so that reducers can keep only one record writer 
open at any time thereby reducing the memory pressure on the reducers 
(HIVE-6455) , But from the query execution it looks like the property of 
*distribute by clause* (does not guarantee clustering or sorting properties on 
the distributed keys) takes precedence and hence the keys at the reducer side 
are not in sorted order and fails with null pointer exception.

[~prasanth_j] [~jcamachorodriguez] [~vikram.dixit] any thoughts on this?
*Note*: This issue is not seen when vectorization is enabled


> NPE when inserting data with 'distribute by' clause
> ---
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
> Environment: EMR
>Reporter: Aki Tanaka
>Assignee: Lynch Lee
>Priority: Major
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at

[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-08-06 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172076#comment-17172076
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

[~szita] [~rajesh.balamohan] Ping for review request!

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-08-03 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170537#comment-17170537
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~kgyrtkirk] [~jcamachorodriguez] Ping for review request!

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remo

[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166157#comment-17166157
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

[~gopalv] [~prasanth_j] Gentle reminder :

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-27 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: [~ashutoshc] [~rajesh.balamohan] Could you please review?)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166155#comment-17166155
 ] 

Syed Shameerur Rahman commented on HIVE-23873:
--

[~chiran54321] To run hive pre-commit tests you need to raise pull request in 
github (hive-repo). Can you please do the same? and  please add a test around 
this case.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch, 
> HIVE-23873.3.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", 
> "hive

[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165584#comment-17165584
 ] 

Syed Shameerur Rahman edited comment on HIVE-23873 at 7/27/20, 9:59 AM:


[~chiran54321] Any update on this? I can pick this if you are busy with other 
stuffs.


was (Author: srahman):
[~chiran54321] Any update on this? I can pick this if you don't have the 
bandwidth.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive

[jira] [Commented] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-27 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165584#comment-17165584
 ] 

Syed Shameerur Rahman commented on HIVE-23873:
--

[~chiran54321] Any update on this? I can pick this if you don't have the 
bandwidth.

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this case.
> =
> 1. Create table in Oracle
> {code}
> create table TESTHIVEJDBCSTORAGE(ID INT, FNAME VARCHAR(20));
> {code}
> 2. Insert dummy data.
> {code}
> Insert into TESTHIVEJDBCSTORAGE values (1, 'Name1');
> {code}
> 3. Create JDBCStorageHandler table in Hive.
> {code}
> CREATE EXTERNAL TABLE default.TESTHIVEJDBCSTORAGE_HIVE_TBL (ID INT, FNAME 
> VARCHAR(20)) 
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' 
> TBLPROPERTIES ( 
> "hive.sql.database.type" = "ORACLE", 
> "hive.sql.jdbc.driver" = "oracle.jdbc.OracleDriver", 
> "hive.sql.jdbc.url" = "jdbc:oracle:thin:@orachehostname/XE", 
> "hive.sql.dbcp.username" = "chiran", 
> "hive.sq

[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-22 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161746#comment-17161746
 ] 

Syed Shameerur Rahman edited comment on HIVE-23873 at 7/22/20, 10:50 AM:
-

[~chiran54321] Even though it makes sense to change the column names to lower 
case, I am not sure of the side effects it can have.

The core of the problem here is the inconsistency in the value of 
hiveColumnList from JbdcSerde and JdbcRecordIterator. we could do something 
like 
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L103
 in  
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/JdbcRecordIterator.java#L62
 when conf.get(Constants.JDBC_QUERY) == null which i think will solve this 
problem

cc: [~jcamachorodriguez]


was (Author: srahman):
[~chiran54321] Even though it makes sense to change the column names to lower 
case, I am not sure of the side effects it can have.

The core of the problem here is the inconsistency in the value of 
hiveColumnList from JbdcSerde and JdbcRecordIterator. we could do something 
like 
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L103
 in  
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/JdbcRecordIterator.java#L62
 when conf.get(Constants.JDBC_QUERY) == null which i think will solve this 
problem

cc: #[~jcamachorodriguez]

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42

[jira] [Comment Edited] (HIVE-23873) Querying Hive JDBCStorageHandler table fails with NPE

2020-07-22 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161746#comment-17161746
 ] 

Syed Shameerur Rahman edited comment on HIVE-23873 at 7/22/20, 10:49 AM:
-

[~chiran54321] Even though it makes sense to change the column names to lower 
case, I am not sure of the side effects it can have.

The core of the problem here is the inconsistency in the value of 
hiveColumnList from JbdcSerde and JdbcRecordIterator. we could do something 
like 
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L103
 in  
https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/JdbcRecordIterator.java#L62
 when conf.get(Constants.JDBC_QUERY) == null which i think will solve this 
problem

cc: #[~jcamachorodriguez]


was (Author: srahman):
cc: [~jcamachorodriguez]

> Querying Hive JDBCStorageHandler table fails with NPE
> -
>
> Key: HIVE-23873
> URL: https://issues.apache.org/jira/browse/HIVE-23873
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Critical
> Attachments: HIVE-23873.01.patch, HIVE-23873.02.patch
>
>
> Scenario is Hive table having same schema as table in Oracle, however when we 
> query the table with data it fails with NPE, below is the trace.
> {code}
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:617)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hive.storage.jdbc.JdbcSerDe.deserialize(JdbcSerDe.java:164) 
> ~[hive-jdbc-handler-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:598)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:524) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2739) 
> ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>  ~[hive-exec-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
>  ~[hive-service-3.1.0.3.1.5.0-152.jar:3.1.0.3.1.5.0-152]
> ... 34 more
> {code}
> Problem appears when column names in Oracle are in Upper case and since in 
> Hive, table and column names are forced to store in lowercase during 
> creation. User runs into NPE error while fetching data.
> While deserializing data, input consists of column names in lower case which 
> fails to get the value
> https://github.com/apache/hive/blob/rel/release-3.1.2/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcSerDe.java#L136
> {code}
> rowVal = ((ObjectWritable)value).get();
> {code}
> Log Snio:
> =
> {code}
> 2020-07-17T16:49:09,598 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: dao.GenericJdbcDatabaseAccessor (:()) 
> - Query to execute is [select * from TESTHIVEJDBCSTORAGE]
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** ColumnKey = 
> ID
> 2020-07-17T16:49:10,642 INFO  [04ed42ec-91d2-4662-aee7-37e840a06036 
> HiveServer2-Handler-Pool: Thread-104]: jdbc.JdbcSerDe (:()) - *** Blob value 
> = {fname=OW[class=class java.lang.String,value=Name1], id=OW[class=class 
> java.lang.Integer,value=1]}
> {code}
> Simple Reproducer for this 

  1   2   3   4   >