[jira] [Created] (HIVE-26273) “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty
michaelli created HIVE-26273: Summary: “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty Key: HIVE-26273 URL: https://issues.apache.org/jira/browse/HIVE-26273 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 2.1.1 Reporter: michaelli Attachments: execution plan for good run.txt, execution plan for issue run.txt, issue log.txt *Issue summary:* When inner join tableA to tableB on partition key of tableB, if dynamic partition pruning is enabled and tableA is emplty, the query will fail with below exception: Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: File hdfs://nameservice1/tmp/hive/hive/fddbc5ac-3596-428d-8b42-cbc61952d182/hive_2022-05-30_14-03-17_139_1843975612196554546-15339/-mr-10003/2/1 does not exist. (state=42000,code=3). I encountered this when using hive-2.1.1-cdh6.3.2, and i think this occurs to other versions too. *Steps to reproduce the issue:* 1. prepare tables: CREATE TABLE tableA ( businsys_no decimal(10,0), acct_id string, prod_code string) PARTITIONED BY (init_date int) stored as orc; CREATE TABLE tableB ( client_id string, open_date decimal(10,0), client_status string, organ_flag string) PARTITIONED BY (businsys_no decimal(10,0)) stored as orc; 2. prepare data for tables: – tableA should be emplty; -- prepare some data for tableB 3. run below steps to reproduce the issue: set hive.execution.engine=spark; set hive.auto.convert.join=true; set hive.spark.dynamic.partition.pruning=true; set hive.spark.dynamic.partition.pruning.map.join.only=true; select * from (select * from tableA fp where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no from tableB ic) ici on cfp.businsys_no = ici.businsys_no and cfp.acct_id = ici.client_id; 4. currently we turned off spark dynamic partition pruning to workaround this: set hive.execution.engine=spark; set hive.auto.convert.join=true; set hive.spark.dynamic.partition.pruning=false; set hive.spark.dynamic.partition.pruning.map.join.only=false; select * from (select * from tableA fp where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no from tableB ic) ici on cfp.businsys_no = ici.businsys_no and cfp.acct_id = ici.client_id; *execution logs and execution plan:* the execution logs and execution plans are attached: -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26274) No vectorization if query has upper case window function
Krisztian Kasa created HIVE-26274: - Summary: No vectorization if query has upper case window function Key: HIVE-26274 URL: https://issues.apache.org/jira/browse/HIVE-26274 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} CREATE TABLE t1 (a int, b int); EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1; {code} {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) Vertices: Map 1 Execution mode: vectorized, llap LLAP IO: all inputs Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vector.serde.deserialize IS true inputFormatFeatureSupport: [DECIMAL_64] featureSupportInUse: [DECIMAL_64] inputFileFormats: org.apache.hadoop.mapred.TextInputFormat allNative: true usesVectorUDFAdaptor: false vectorized: true Reducer 2 Execution mode: llap Reduce Vectorization: enabled: true enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez IN [tez] IS true notVectorizedReason: PTF operator: ROW_NUMBER not in supported functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, min, rank, row_number, sum] vectorized: false Stage: Stage-0 Fetch Operator {code} {code} notVectorizedReason: PTF operator: ROW_NUMBER not in supported functions [avg, count, dense_rank, first_value, lag, last_value, lead, max, min, rank, row_number, sum] {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26275) Make LlapBaseInputFormat type safe
Hankó Gergely created HIVE-26275: Summary: Make LlapBaseInputFormat type safe Key: HIVE-26275 URL: https://issues.apache.org/jira/browse/HIVE-26275 Project: Hive Issue Type: Improvement Components: llap Reporter: Hankó Gergely The code of LlapBaseInputFormat is not type safe. It contains suppressed warnings for unchecked casts and raw type usage, and it is used as a raw type everywhere. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26276) Fix package to org.apache.hadoop.hive.serde2 for JsonSerDe & RegexSerDe in HMS DB
Naresh P R created HIVE-26276: - Summary: Fix package to org.apache.hadoop.hive.serde2 for JsonSerDe & RegexSerDe in HMS DB Key: HIVE-26276 URL: https://issues.apache.org/jira/browse/HIVE-26276 Project: Hive Issue Type: Bug Reporter: Naresh P R Similar to HIVE-24770, JsonSerDe & RegexSerDe should be updated to newer package {code:java} // Avoid dependency of hive-hcatalog.jar Old - org.apache.hive.hcatalog.data.JsonSerDe New - org.apache.hadoop.hive.serde2.JsonSerDe // Avoid dependency of hive-contrib.jar Old - org.apache.hadoop.hive.contrib.serde2.RegexSerDe New - org.apache.hadoop.hive.serde2.RegexSerDe {code} This should be handled in upgrade flow. -- This message was sent by Atlassian Jira (v8.20.7#820007)