[jira] [Created] (HIVE-28503) Wrong results(NULL) when string concat operation with || operator for ORC file format when vectorization enabled

2024-09-05 Thread Mahesh Raju Somalaraju (Jira)
Mahesh Raju Somalaraju created HIVE-28503:
-

 Summary: Wrong results(NULL) when string concat operation with || 
operator for ORC file format when vectorization enabled
 Key: HIVE-28503
 URL: https://issues.apache.org/jira/browse/HIVE-28503
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Mahesh Raju Somalaraju
Assignee: Mahesh Raju Somalaraju


Wrong results(NULL) when string concat operation with || operator for ORC file 
format when vectorization enabled.

set hive.query.results.cache.enabled=false;

set hive.fetch.task.conversion=none;

set hive.vectorized.execution.enabled=true;

Result is NULL when we do concat operation with || operator. Locally it is not 
able to reproduce. It is able to reproduce in cluster with more records.Input 
data should be mix of NULL and NOT NULL values something like below.

create a table with orc file format and has 3 string columns and insert data 
such way that it should have mix of NULL values and NOT NULL values.

 
|column1|column2|column3|count|
|NULL      |NULL     |NULL      |18000  |
|G         |L        |A1        |123932 |

with above configuration, perform concat() operation with || operator and 
insert new row with the concat() results.

select * from (select t1.column1, t1.column2, t1.column3,  *t1.column1 || 
t1.column2 || t1.column3 as VEH_MODEL_ID*

from test_table t1 )t where VEH_MODEL_ID is NULL and if(column1 is null,0,1)=1 
AND if(column2 is null,0,1)=1 AND if(column3 is null,0,1)=1 limit 1;

in above query, *t1.column1 || t1.column2 || t1.column3 as VEH_MODEL_ID*  
operation is returning the NULL result eventhough the input string values are 
not null.
|t.VEH_MODEL_ID|t.column1|t.column2|t.column3|
|NULL|G|L|A2|

 

+Proposed solution as per code review:+

+*Root cause:*+

While doing concat() operation, In *StringGroupConcatColCol* class, if input 
batch vector has mixed of NULL and NOT NULL values of inputs then we are not 
setting output vector batch flags related to NULL and NOT NULLS correctly . 
Each value in the vector has the flag whether it is NULL or NOT NULL. But  here 
we are not setting correctly the whole output vector flag (outV.noNulls).  
Without this flag it is working for parquet, some how they may be referring 
each value instead of checking whole output vector flag whether it is NULL or 
NOT NULL.

+*code snippet:*+

*StringGroupConcatColCol->evaluate() method:*

if (inV1.noNulls && !inV2.noNulls) {   *>>  if any one input has NULL, then 
output should be NULL.* 

outV.noNulls = false; *--> setting this flag false as all values in this are 
NULLs*



}

else if (!inV1.noNulls && inV2.noNulls) { *>>  if any one input has NULL, then 
output should be NULL.*

outV.noNulls = false; --> *setting this flag false as all values in this are 
NULLs* 

---

}

else if (!inV1.noNulls && !inV2.noNulls) {  *>>  if two inputs are NULL, then 
output should be NULL.*

outV.noNulls = false; *--> setting this flag false as all values in this are 
NULLs**

---

}

else {                  *--> there are no nulls in either input vector*

{color:#4c9aff}*outV.noNulls = true;  --> this has to be set true, as there are 
no NULL values, this check is missed currently.*{color}
// perform data operation

---

}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27512) CalciteSemanticException.UnsupportedFeature enum to capital

2023-11-22 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju updated HIVE-27512:
--
Status: Patch Available  (was: In Progress)

> CalciteSemanticException.UnsupportedFeature enum to capital
> ---
>
> Key: HIVE-27512
> URL: https://issues.apache.org/jira/browse/HIVE-27512
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie, pull-request-available
>
> https://github.com/apache/hive/blob/3bc62cbc2d42c22dfd55f78ad7b41ec84a71380f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java#L32-L39
> {code}
>   public enum UnsupportedFeature {
> Distinct_without_an_aggreggation, Duplicates_in_RR, 
> Filter_expression_with_non_boolean_return_type,
> Having_clause_without_any_groupby, Invalid_column_reference, 
> Invalid_decimal,
> Less_than_equal_greater_than, Others, Same_name_in_multiple_expressions,
> Schema_less_table, Select_alias_in_having_clause, Select_transform, 
> Subquery,
> Table_sample_clauses, UDTF, Union_type, Unique_join,
> HighPrecissionTimestamp // CALCITE-1690
>   };
> {code}
> this just hurts my eyes, I expect it as DISTINCT_WITHOUT_AN_AGGREGATION ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27512) CalciteSemanticException.UnsupportedFeature enum to capital

2023-11-21 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27512 started by Mahesh Raju Somalaraju.
-
> CalciteSemanticException.UnsupportedFeature enum to capital
> ---
>
> Key: HIVE-27512
> URL: https://issues.apache.org/jira/browse/HIVE-27512
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie
>
> https://github.com/apache/hive/blob/3bc62cbc2d42c22dfd55f78ad7b41ec84a71380f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java#L32-L39
> {code}
>   public enum UnsupportedFeature {
> Distinct_without_an_aggreggation, Duplicates_in_RR, 
> Filter_expression_with_non_boolean_return_type,
> Having_clause_without_any_groupby, Invalid_column_reference, 
> Invalid_decimal,
> Less_than_equal_greater_than, Others, Same_name_in_multiple_expressions,
> Schema_less_table, Select_alias_in_having_clause, Select_transform, 
> Subquery,
> Table_sample_clauses, UDTF, Union_type, Unique_join,
> HighPrecissionTimestamp // CALCITE-1690
>   };
> {code}
> this just hurts my eyes, I expect it as DISTINCT_WITHOUT_AN_AGGREGATION ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27512) CalciteSemanticException.UnsupportedFeature enum to capital

2023-11-20 Thread Mahesh Raju Somalaraju (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788299#comment-17788299
 ] 

Mahesh Raju Somalaraju commented on HIVE-27512:
---

[~abstractdog] i have assigned this Jira to  myself and will raise PR.

> CalciteSemanticException.UnsupportedFeature enum to capital
> ---
>
> Key: HIVE-27512
> URL: https://issues.apache.org/jira/browse/HIVE-27512
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie
>
> https://github.com/apache/hive/blob/3bc62cbc2d42c22dfd55f78ad7b41ec84a71380f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java#L32-L39
> {code}
>   public enum UnsupportedFeature {
> Distinct_without_an_aggreggation, Duplicates_in_RR, 
> Filter_expression_with_non_boolean_return_type,
> Having_clause_without_any_groupby, Invalid_column_reference, 
> Invalid_decimal,
> Less_than_equal_greater_than, Others, Same_name_in_multiple_expressions,
> Schema_less_table, Select_alias_in_having_clause, Select_transform, 
> Subquery,
> Table_sample_clauses, UDTF, Union_type, Unique_join,
> HighPrecissionTimestamp // CALCITE-1690
>   };
> {code}
> this just hurts my eyes, I expect it as DISTINCT_WITHOUT_AN_AGGREGATION ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27512) CalciteSemanticException.UnsupportedFeature enum to capital

2023-11-20 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju reassigned HIVE-27512:
-

Assignee: Mahesh Raju Somalaraju

> CalciteSemanticException.UnsupportedFeature enum to capital
> ---
>
> Key: HIVE-27512
> URL: https://issues.apache.org/jira/browse/HIVE-27512
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie
>
> https://github.com/apache/hive/blob/3bc62cbc2d42c22dfd55f78ad7b41ec84a71380f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java#L32-L39
> {code}
>   public enum UnsupportedFeature {
> Distinct_without_an_aggreggation, Duplicates_in_RR, 
> Filter_expression_with_non_boolean_return_type,
> Having_clause_without_any_groupby, Invalid_column_reference, 
> Invalid_decimal,
> Less_than_equal_greater_than, Others, Same_name_in_multiple_expressions,
> Schema_less_table, Select_alias_in_having_clause, Select_transform, 
> Subquery,
> Table_sample_clauses, UDTF, Union_type, Unique_join,
> HighPrecissionTimestamp // CALCITE-1690
>   };
> {code}
> this just hurts my eyes, I expect it as DISTINCT_WITHOUT_AN_AGGREGATION ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27703) Remove PowerMock from itests/hive-jmh and upgrade mockito to 4.11

2023-11-20 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju resolved HIVE-27703.
---
Resolution: Duplicate

Handled in the part of : https://issues.apache.org/jira/browse/HIVE-27736

> Remove PowerMock from itests/hive-jmh and upgrade mockito to 4.11
> -
>
> Key: HIVE-27703
> URL: https://issues.apache.org/jira/browse/HIVE-27703
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Priority: Major
>  Labels: newbie, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27871) Fix some formatting problems is YarnQueueHelper

2023-11-20 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju updated HIVE-27871:
--
Status: Patch Available  (was: Open)

> Fix some formatting problems is YarnQueueHelper
> ---
>
> Key: HIVE-27871
> URL: https://issues.apache.org/jira/browse/HIVE-27871
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie, pull-request-available
>
> https://github.com/apache/hive/blob/cbc5d2d7d650f90882c5c4ad0026a94d2e586acb/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/YarnQueueHelper.java#L54-L57
> {code}
>   private static String webapp_conf_key = YarnConfiguration.RM_WEBAPP_ADDRESS;
>   private static String webapp_ssl_conf_key = 
> YarnConfiguration.RM_WEBAPP_HTTPS_ADDRESS;
>   private static String yarn_HA_enabled = YarnConfiguration.RM_HA_ENABLED;
>   private static String yarn_HA_rmids = YarnConfiguration.RM_HA_IDS;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27871) Fix some formatting problems is YarnQueueHelper

2023-11-14 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju reassigned HIVE-27871:
-

Assignee: Mahesh Raju Somalaraju

> Fix some formatting problems is YarnQueueHelper
> ---
>
> Key: HIVE-27871
> URL: https://issues.apache.org/jira/browse/HIVE-27871
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie
>
> https://github.com/apache/hive/blob/cbc5d2d7d650f90882c5c4ad0026a94d2e586acb/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/YarnQueueHelper.java#L54-L57
> {code}
>   private static String webapp_conf_key = YarnConfiguration.RM_WEBAPP_ADDRESS;
>   private static String webapp_ssl_conf_key = 
> YarnConfiguration.RM_WEBAPP_HTTPS_ADDRESS;
>   private static String yarn_HA_enabled = YarnConfiguration.RM_HA_ENABLED;
>   private static String yarn_HA_rmids = YarnConfiguration.RM_HA_IDS;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27679) Ranger Yarn Queue policies are not applying correctly, rework done for HIVE-26352

2023-09-11 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju reassigned HIVE-27679:
-

Assignee: Mahesh Raju Somalaraju

> Ranger Yarn Queue policies are not applying correctly, rework done for 
> HIVE-26352
> -
>
> Key: HIVE-27679
> URL: https://issues.apache.org/jira/browse/HIVE-27679
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> This Jira is raised to modify/fix the code which is done in part of 
> *HIVE-26352.*
> Versions which have {*}HIVE-26352{*}/HIVE-27029 are not able to enforce Yarn 
> Ranger queue policies, because the change made in {*}HIVE-26352{*}/HIVE-27029 
> catches ALL exceptions, so exceptions normally thrown is ignored and the user 
> is allowed to run a job in that queue. 
> Allowing user to run jobs is not expecting behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27679) Ranger Yarn Queue policies are not applying correctly, rework done for HIVE-26352

2023-09-10 Thread Mahesh Raju Somalaraju (Jira)
Mahesh Raju Somalaraju created HIVE-27679:
-

 Summary: Ranger Yarn Queue policies are not applying correctly, 
rework done for HIVE-26352
 Key: HIVE-27679
 URL: https://issues.apache.org/jira/browse/HIVE-27679
 Project: Hive
  Issue Type: Bug
Reporter: Mahesh Raju Somalaraju


This Jira is raised to modify/fix the code which is done in part of 
*HIVE-26352.*

Versions which have {*}HIVE-26352{*}/HIVE-27029 are not able to enforce Yarn 
Ranger queue policies, because the change made in {*}HIVE-26352{*}/HIVE-27029 
catches ALL exceptions, so exceptions normally thrown is ignored and the user 
is allowed to run a job in that queue. 

Allowing user to run jobs is not expecting behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL

2023-08-23 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju resolved HIVE-27303.
---
Resolution: Fixed

> select query result is different when enable/disable mapjoin with UNION ALL
> ---
>
> Key: HIVE-27303
> URL: https://issues.apache.org/jira/browse/HIVE-27303
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: pull-request-available
>
> select query result is different when enable/disable mapjoin with UNION ALL
> Below are the reproduce steps.
> As per query when map.join is disabled it should not give rows(duplicate). 
> Same is working fine with map.join=true.
> Expected result: Empty rows.
> Problem: returning duplicate rows.
> Steps:
> --
> SET hive.server2.tez.queue.access.check=true;
> SET tez.queue.name=default
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion=none;
> SET hive.execution.engine=tez;
> SET hive.stats.autogather=true;
> SET hive.server2.enable.doAs=false;
> SET hive.auto.convert.join=false;
> drop table if exists hive1_tbl_data;
> drop table if exists hive2_tbl_data;
> drop table if exists hive3_tbl_data;
> drop table if exists hive4_tbl_data;
> CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>    CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>  
> insert into table hive1_tbl_data select 
> '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1';
> insert into table hive1_tbl_data select 
> '2','john','doe','j...@hotmail.com','2014-01-01 
> 12:01:02','4000-1';insert into table hive2_tbl_data select 
> '1','john','doe','j...@hotmail.com','201

[jira] [Commented] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL

2023-08-23 Thread Mahesh Raju Somalaraju (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758346#comment-17758346
 ] 

Mahesh Raju Somalaraju commented on HIVE-27303:
---

Merged the PR.

https://github.com/apache/hive/pull/4406

> select query result is different when enable/disable mapjoin with UNION ALL
> ---
>
> Key: HIVE-27303
> URL: https://issues.apache.org/jira/browse/HIVE-27303
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: pull-request-available
>
> select query result is different when enable/disable mapjoin with UNION ALL
> Below are the reproduce steps.
> As per query when map.join is disabled it should not give rows(duplicate). 
> Same is working fine with map.join=true.
> Expected result: Empty rows.
> Problem: returning duplicate rows.
> Steps:
> --
> SET hive.server2.tez.queue.access.check=true;
> SET tez.queue.name=default
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion=none;
> SET hive.execution.engine=tez;
> SET hive.stats.autogather=true;
> SET hive.server2.enable.doAs=false;
> SET hive.auto.convert.join=false;
> drop table if exists hive1_tbl_data;
> drop table if exists hive2_tbl_data;
> drop table if exists hive3_tbl_data;
> drop table if exists hive4_tbl_data;
> CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>    CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>  
> insert into table hive1_tbl_data select 
> '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1';
> insert into table hive1_tbl_data select 
> '2','john','doe','j...@hotmail.com','2014-01-01 
> 12:01:02','4000-100

[jira] [Commented] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL

2023-07-31 Thread Mahesh Raju Somalaraju (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749470#comment-17749470
 ] 

Mahesh Raju Somalaraju commented on HIVE-27303:
---

[~seonggon] Thanks for your PR, let me verify it. If fix is working fine then 
you can merge the PR.

> select query result is different when enable/disable mapjoin with UNION ALL
> ---
>
> Key: HIVE-27303
> URL: https://issues.apache.org/jira/browse/HIVE-27303
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: pull-request-available
>
> select query result is different when enable/disable mapjoin with UNION ALL
> Below are the reproduce steps.
> As per query when map.join is disabled it should not give rows(duplicate). 
> Same is working fine with map.join=true.
> Expected result: Empty rows.
> Problem: returning duplicate rows.
> Steps:
> --
> SET hive.server2.tez.queue.access.check=true;
> SET tez.queue.name=default
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion=none;
> SET hive.execution.engine=tez;
> SET hive.stats.autogather=true;
> SET hive.server2.enable.doAs=false;
> SET hive.auto.convert.join=false;
> drop table if exists hive1_tbl_data;
> drop table if exists hive2_tbl_data;
> drop table if exists hive3_tbl_data;
> drop table if exists hive4_tbl_data;
> CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
> CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>    CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN 
> string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
> string) 
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
>  STORED AS INPUTFORMAT                              
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
>  OUTPUTFORMAT                                       
>    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  TBLPROPERTIES (                                    
>    'TRANSLATED_TO_EXTERNAL'='true',                 
>    'bucketing_version'='2',                         
>    'external.table.purge'='true',                   
>    'parquet.compression'='SNAPPY');
>  
> insert into table hive1_tbl_data select 
> '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1';
> insert into table hive1_tbl_data select 
> '2','john','doe','j...@hot

[jira] [Updated] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL

2023-05-22 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju updated HIVE-27303:
--
Description: 
select query result is different when enable/disable mapjoin with UNION ALL

Below are the reproduce steps.

As per query when map.join is disabled it should not give rows(duplicate). Same 
is working fine with map.join=true.

Expected result: Empty rows.

Problem: returning duplicate rows.

Steps:

--

SET hive.server2.tez.queue.access.check=true;
SET tez.queue.name=default
SET hive.query.results.cache.enabled=false;
SET hive.fetch.task.conversion=none;
SET hive.execution.engine=tez;
SET hive.stats.autogather=true;
SET hive.server2.enable.doAs=false;
SET hive.auto.convert.join=false;

drop table if exists hive1_tbl_data;
drop table if exists hive2_tbl_data;
drop table if exists hive3_tbl_data;
drop table if exists hive4_tbl_data;

CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN string,COLUMN_LN 
string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');

CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN string,COLUMN_LN 
string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');

CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN string,COLUMN_LN 
string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');

   CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN 
string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');

 

insert into table hive1_tbl_data select 
'1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1';

insert into table hive1_tbl_data select 
'2','john','doe','j...@hotmail.com','2014-01-01 
12:01:02','4000-1';insert into table hive2_tbl_data select 
'1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','1'; 
insert into table hive2_tbl_data select 
'2','john','doe','j...@hotmail.com','2014-01-01 12:01:02','1'; 

 

select
       t.COLUMID
  from (
      select distinct
          t.COLUMID as COLUMID
      from (SELECT COLUMID FROM hive3_tbl_data UNION ALL SELECT COLUMID FROM 
hive1_tbl_data) t
  ) t
  left join (
      select
           distinct t.COLUMID
      from (SELECT COLUMID FROM hive4_tbl_data UNION ALL SELECT COLUMID FROM 
hive2_tbl_data) t
  ) t1 on t.COLUMID = t1.COLUMID
  where t1.COLUMID is null;

 

  was:
select query result is different when enable/disable mapjoin with UNION ALL

Below are the 

[jira] [Created] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL

2023-04-27 Thread Mahesh Raju Somalaraju (Jira)
Mahesh Raju Somalaraju created HIVE-27303:
-

 Summary: select query result is different when enable/disable 
mapjoin with UNION ALL
 Key: HIVE-27303
 URL: https://issues.apache.org/jira/browse/HIVE-27303
 Project: Hive
  Issue Type: Bug
Reporter: Mahesh Raju Somalaraju
Assignee: Mahesh Raju Somalaraju


select query result is different when enable/disable mapjoin with UNION ALL

Below are the reproduce steps.

As per query when map.join is disabled it should not give rows(duplicate). Same 
is working fine with map.join=true.

Expected result: Empty rows.

Problem: returning duplicate rows.

Steps:

--

SET hive.server2.tez.queue.access.check=true;
SET tez.queue.name=default
SET hive.query.results.cache.enabled=false;
SET hive.fetch.task.conversion=none;
SET hive.execution.engine=tez;
SET hive.stats.autogather=true;
SET hive.server2.enable.doAs=false;
SET hive.auto.convert.join=true;


drop table if exists hive1_tbl_data;
drop table if exists hive2_tbl_data;
drop table if exists hive3_tbl_data;
drop table if exists hive4_tbl_data;


CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN string,COLUMN_LN 
string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');

CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN string,COLUMN_LN 
string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');


CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN string,COLUMN_LN 
string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');


   CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN 
string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM 
string) 
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 TBLPROPERTIES (                                    
   'TRANSLATED_TO_EXTERNAL'='true',                 
   'bucketing_version'='2',                         
   'external.table.purge'='true',                   
   'parquet.compression'='SNAPPY');

 


insert into table hive1_tbl_data select 
'1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1';

insert into table hive1_tbl_data select 
'2','john','doe','j...@hotmail.com','2014-01-01 
12:01:02','4000-1';insert into table hive2_tbl_data select 
'1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','1'; 
insert into table hive2_tbl_data select 
'2','john','doe','j...@hotmail.com','2014-01-01 12:01:02','1'; 

 

select
       t.COLUMID
  from (
      select distinct
          t.COLUMID as COLUMID
      from (SELECT COLUMID FROM hive3_tbl_data UNION ALL SELECT COLUMID FROM 
hive1_tbl_data) t
  ) t
  left join (
      select
           distinct t.COLUMID
      from (SELECT COLUMID FROM hive4_tbl_data UNI

[jira] [Resolved] (HIVE-27196) Upgrade jettision version to 1.5.4 due to CVEs

2023-04-24 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju resolved HIVE-27196.
---
Resolution: Duplicate

this is fixed in part of HIVE-27286.

Hence closing this jira.

> Upgrade jettision version to 1.5.4 due to CVEs
> --
>
> Key: HIVE-27196
> URL: https://issues.apache.org/jira/browse/HIVE-27196
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> [CVE-2023-1436|https://www.cve.org/CVERecord?id=CVE-2023-1436]
> [CWE-400|https://cwe.mitre.org/data/definitions/400.html]
> Need to update jettison version to 1.5.4 version due to above CVE issues.
> version 1.5.4 has no CVE issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27196) Upgrade jettision version to 1.5.4 due to CVEs

2023-04-24 Thread Mahesh Raju Somalaraju (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716084#comment-17716084
 ] 

Mahesh Raju Somalaraju commented on HIVE-27196:
---

this is fixed in part of HIVE-27286.

Hence closing this jira.

> Upgrade jettision version to 1.5.4 due to CVEs
> --
>
> Key: HIVE-27196
> URL: https://issues.apache.org/jira/browse/HIVE-27196
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> [CVE-2023-1436|https://www.cve.org/CVERecord?id=CVE-2023-1436]
> [CWE-400|https://cwe.mitre.org/data/definitions/400.html]
> Need to update jettison version to 1.5.4 version due to above CVE issues.
> version 1.5.4 has no CVE issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27198) Delete directly aborted transactions instead of select and loading ids

2023-03-30 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju reassigned HIVE-27198:
-


> Delete directly aborted transactions instead of select and loading ids
> --
>
> Key: HIVE-27198
> URL: https://issues.apache.org/jira/browse/HIVE-27198
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> in cleaning the aborted transaction , we can directly deletes the txns 
> instead of selecting and process.
> method name: 
> cleanEmptyAbortedAndCommittedTxns
> Code:
> String s = "SELECT \"TXN_ID\" FROM \"TXNS\" WHERE " +
> "\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " +
> " (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " + 
> TxnStatus.COMMITTED + ") AND "
> + " \"TXN_ID\" < " + lowWaterMark;
>  
> proposed code:
> String s = "DELETE \"TXN_ID\" FROM \"TXNS\" WHERE " +
> "\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " +
> " (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " + 
> TxnStatus.COMMITTED + ") AND "
> + " \"TXN_ID\" < " + lowWaterMark;
>  
> the select needs to be eliminated and the delete should work with the where 
> clause instead of the built in clause
> we can see no reason for loading the ids into memory and then generate a huge 
> sql
>  
> Bathcing is also not necessary here, we can deletes the records directly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27196) Upgrade jettision version to 1.5.4 due to CVEs

2023-03-30 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju reassigned HIVE-27196:
-


> Upgrade jettision version to 1.5.4 due to CVEs
> --
>
> Key: HIVE-27196
> URL: https://issues.apache.org/jira/browse/HIVE-27196
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> [CVE-2023-1436|https://www.cve.org/CVERecord?id=CVE-2023-1436]
> [CWE-400|https://cwe.mitre.org/data/definitions/400.html]
> Need to update jettison version to 1.5.4 version due to above CVE issues.
> version 1.5.4 has no CVE issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27029) hive query fails with Filesystem closed error

2023-02-06 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju updated HIVE-27029:
--
Description: 
This Jira is raised to modify/fix the code which is done in part of 
*HIVE-26352.*

 

we should remove the finally block as this is causing the filesystem close 
errors.

String queueName, String userName) throws IOException, InterruptedException {
UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
try {
ugi.doAs((PrivilegedExceptionAction) () -> {
checkQueueAccessInternal(queueName, userName);
return null;
});
} {color:#0747a6}*finally {*{color}
{color:#0747a6} *try {*{color}
{color:#0747a6} *FileSystem.closeAllForUGI(ugi);*{color}
} catch (IOException exception) {
LOG.error("Could not clean up file-system handles for UGI: " + ugi, exception);
}
}
}

 

Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:483) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2771) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2796)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2793)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2812)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:374)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.getEncryptionZoneForPath(Hadoop23Shims.java:1384)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1379)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2484)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
 

 

  was:
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:483) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2771) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2796)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2793)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2812)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:374)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.getEncryptionZoneForPath(Hadoop23Shims.java:1384)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1379)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2484)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
 

 


> hive query fails with Filesystem closed error
> -
>
> Key: HIVE-27029
> URL: https://issues.apache.org/jira/browse/HIVE-27029
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> This Jira is raised to modify/fix the code which is done in part of 
> *HIVE-26352.*
>  
> we should remove the finally block as this is causing the filesystem close 
> errors.
> String queueName, String userName) throws IOException, InterruptedException {
> UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
> try {
> ugi.doAs((PrivilegedExceptionAction) () -> {
> checkQueueAccessInternal(queueName, userName);
> return null;
> });
> } {color:#0747a6}*finally {*{color}
> {color:#0747a6} *try {*{color}
> {color:#0747a6} *FileSystem.closeAllForUGI(ugi);*{color}
> } catch (IOException exception) {
> LOG.error("Could not clean up file-system handles for UGI: " + ugi, 
> exception);
> }
> }
> }
>  
> Caused by: java.io.IOException: Filesystem closed
> at org

[jira] [Updated] (HIVE-27029) hive query fails with Filesystem closed error

2023-02-06 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju updated HIVE-27029:
--
Description: 
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:483) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2771) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2796)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2793)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2812)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:374)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.getEncryptionZoneForPath(Hadoop23Shims.java:1384)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1379)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2484)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
 

 

  was:
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:483) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2771) 
~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2796)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2793)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2812)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:374)
 ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.getEncryptionZoneForPath(Hadoop23Shims.java:1384)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1379)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2484)
 ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
 

 

steps:
1. When using explicit queue name the tez.queue.access is used. If statistics 
gathering is enabled the second insert fails at the compute_stats() phase.

beeline --hiveconf tez.queue.name=default  -e "
SET hive.query.results.cache.enabled=false;
SET hive.fetch.task.conversion=none;

SET hive.stats.autogather=true;

drop table if exists default.bigd35368p100;
create  table default.bigd35368p100 (name string) partitioned by ( id int);
insert into default.bigd35368p100 select * from default.bigd35368e100;

drop table if exists default.bigd35368p100;
create  table default.bigd35368p100 (name string) partitioned by ( id int);
insert into default.bigd35368p100 select * from default.bigd35368e100;
"


> hive query fails with Filesystem closed error
> -
>
> Key: HIVE-27029
> URL: https://issues.apache.org/jira/browse/HIVE-27029
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:483) 
> ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2771) 
> ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2796)
>  ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2793)
>  ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

[jira] [Assigned] (HIVE-27029) hive query fails with Filesystem closed error

2023-02-06 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju reassigned HIVE-27029:
-

Assignee: Mahesh Raju Somalaraju

> hive query fails with Filesystem closed error
> -
>
> Key: HIVE-27029
> URL: https://issues.apache.org/jira/browse/HIVE-27029
> Project: Hive
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:483) 
> ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2771) 
> ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2796)
>  ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$54.doCall(DistributedFileSystem.java:2793)
>  ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2812)
>  ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:374)
>  ~[hadoop-hdfs-client-3.1.1.7.1.8.11-3.jar:?]
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.getEncryptionZoneForPath(Hadoop23Shims.java:1384)
>  ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1379)
>  ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2484)
>  ~[hive-exec-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
>  
>  
> steps:
> 1. When using explicit queue name the tez.queue.access is used. If statistics 
> gathering is enabled the second insert fails at the compute_stats() phase.
> beeline --hiveconf tez.queue.name=default  -e "
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion=none;
> SET hive.stats.autogather=true;
> drop table if exists default.bigd35368p100;
> create  table default.bigd35368p100 (name string) partitioned by ( id int);
> insert into default.bigd35368p100 select * from default.bigd35368e100;
> drop table if exists default.bigd35368p100;
> create  table default.bigd35368p100 (name string) partitioned by ( id int);
> insert into default.bigd35368p100 select * from default.bigd35368e100;
> "



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26983) Apache Hive website Getting started page showing 404 Error

2023-01-25 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju reassigned HIVE-26983:
-


> Apache Hive website Getting started page showing 404 Error
> --
>
> Key: HIVE-26983
> URL: https://issues.apache.org/jira/browse/HIVE-26983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Minor
>
> [https://hive.apache.org/GettingStarted]  When we click this page then we are 
> getting 404 not found page. Need to check and fix the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)