[jira] [Created] (HIVE-26273) “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty

2022-05-31 Thread michaelli (Jira)
michaelli created HIVE-26273:


 Summary: “file does not exist” exception occured when using spark 
dynamic partition pruning and small table is empty
 Key: HIVE-26273
 URL: https://issues.apache.org/jira/browse/HIVE-26273
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 2.1.1
Reporter: michaelli
 Attachments: execution plan for good run.txt, execution plan for issue 
run.txt, issue log.txt

*Issue summary:*

When inner join tableA to tableB on partition key of tableB, if dynamic 
partition pruning is enabled and tableA is emplty, the query will fail with 
below exception: 

Error: Error while processing statement: FAILED: Execution Error, return code 3 
from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: 
File 
hdfs://nameservice1/tmp/hive/hive/fddbc5ac-3596-428d-8b42-cbc61952d182/hive_2022-05-30_14-03-17_139_1843975612196554546-15339/-mr-10003/2/1
 does not exist. (state=42000,code=3).

I encountered this when using hive-2.1.1-cdh6.3.2, and i think this occurs to 
other versions too.

*Steps to reproduce the issue:*

1. prepare tables:
CREATE TABLE tableA (                             
   businsys_no decimal(10,0),                     
   acct_id string,                                                   
   prod_code string)  
PARTITIONED BY (init_date int)       
stored as orc;                               
CREATE TABLE tableB (      
   client_id string,                              
   open_date decimal(10,0),                       
   client_status string,                          
   organ_flag string)                                           
 PARTITIONED BY (businsys_no decimal(10,0))  
stored as orc;    

2. prepare data for tables:

 – tableA should be emplty;
-- prepare some data for tableB

3. run below steps to reproduce the issue:

set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.spark.dynamic.partition.pruning.map.join.only=true;
select *
      from (select *
            from tableA fp
           where fp.init_date = 20220525) cfp inner join (select ic.client_id, 
ic.businsys_no
                 from tableB ic) ici on cfp.businsys_no = ici.businsys_no
  and cfp.acct_id = ici.client_id;
4. currently we turned off spark dynamic partition pruning to workaround this:

set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=false;
set hive.spark.dynamic.partition.pruning.map.join.only=false;
select *
      from (select *
            from tableA fp
           where fp.init_date = 20220525) cfp inner join (select ic.client_id, 
ic.businsys_no
                 from tableB ic) ici on cfp.businsys_no = ici.businsys_no
  and cfp.acct_id = ici.client_id;

 

*execution logs and execution plan:*

the execution logs and execution plans are attached:

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26274) No vectorization if query has upper case window function

2022-05-31 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-26274:
-

 Summary: No vectorization if query has upper case window function
 Key: HIVE-26274
 URL: https://issues.apache.org/jira/browse/HIVE-26274
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
CREATE TABLE t1 (a int, b int);

EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
{code}
{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  Vertices:
Map 1 
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: true
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2 
Execution mode: llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
notVectorizedReason: PTF operator: ROW_NUMBER not in supported 
functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, 
min, rank, row_number, sum]
vectorized: false

  Stage: Stage-0
Fetch Operator
{code}
{code}
notVectorizedReason: PTF operator: ROW_NUMBER not in supported 
functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, 
min, rank, row_number, sum]
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26275) Make LlapBaseInputFormat type safe

2022-05-31 Thread Jira
Hankó Gergely created HIVE-26275:


 Summary: Make LlapBaseInputFormat type safe
 Key: HIVE-26275
 URL: https://issues.apache.org/jira/browse/HIVE-26275
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Hankó Gergely


The code of LlapBaseInputFormat is not type safe. It contains suppressed 
warnings for unchecked casts and raw type usage, and it is used as a raw type 
everywhere.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26276) Fix package to org.apache.hadoop.hive.serde2 for JsonSerDe & RegexSerDe in HMS DB

2022-05-31 Thread Naresh P R (Jira)
Naresh P R created HIVE-26276:
-

 Summary: Fix package to org.apache.hadoop.hive.serde2 for 
JsonSerDe & RegexSerDe in HMS DB
 Key: HIVE-26276
 URL: https://issues.apache.org/jira/browse/HIVE-26276
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R


Similar to HIVE-24770, JsonSerDe & RegexSerDe should be updated to newer package
{code:java}
// Avoid dependency of hive-hcatalog.jar
Old -  org.apache.hive.hcatalog.data.JsonSerDe
New - org.apache.hadoop.hive.serde2.JsonSerDe

// Avoid dependency of hive-contrib.jar
Old - org.apache.hadoop.hive.contrib.serde2.RegexSerDe
New - org.apache.hadoop.hive.serde2.RegexSerDe
{code}
This should be handled in upgrade flow.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)