Xin Hao created HIVE-12089:
------------------------------
Summary: HiveException (Failed to close AbstractFileMergeOperator)
occurs during loading data to ORC file, when hive.merge.sparkfiles is set to
true. [Spark Branch]
Key: HIVE-12089
URL: https://issues.apache.org/jira/browse/HIVE-12089
Project: Hive
Issue Type: Sub-task
Reporter: Xin Hao
Component Version:
(1)Hive Spark Branch 70eeadd2f019dcb2e301690290c8807731eab7a1 + Hive-11473
patch (HIVE-11473.3-spark.patch) ---> This is to support Spark 1.5 for Hive on
Spark
(2)Spark 1.5.1
Case used:
Big-Bench Data Load (load data from HDFS to Hive warehouse, scored as ORC
format). The related HiveQL:
DROP TABLE IF EXISTS customer_temporary;
CREATE EXTERNAL TABLE customer_temporary
( c_customer_sk bigint --not null
, c_customer_id string --not null
, c_current_cdemo_sk bigint
, c_current_hdemo_sk bigint
, c_current_addr_sk bigint
, c_first_shipto_date_sk bigint
, c_first_sales_date_sk bigint
, c_salutation string
, c_first_name string
, c_last_name string
, c_preferred_cust_flag string
, c_birth_day int
, c_birth_month int
, c_birth_year int
, c_birth_country string
, c_login string
, c_email_address string
, c_last_review_date string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE LOCATION '/user/root/benchmarks/bigbench_n1t/data/customer'
;
DROP TABLE IF EXISTS customer;
CREATE TABLE customer
STORED AS ORC
AS
SELECT * FROM customer_temporary
;
Error/Exception Message:
15/10/12 14:28:38 INFO exec.Utilities: PLAN PATH =
hdfs://bhx2:8020/tmp/hive/root/4e145415-d4ea-4751-9e16-ff31edb0c258/hive_2015-10-12_14-28-12_485_2093357701513622173-1/-mr-10005/d891fdec-eacc-4f66-8827-e2b650c24810/map.xml
15/10/12 14:28:38 INFO OrcFileMergeOperator: ORC merge file input path:
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/000001_0
15/10/12 14:28:38 INFO OrcFileMergeOperator: Merged stripe from file
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/000001_0
[ offset : 3 length: 10525754 row: 247500 ]
15/10/12 14:28:38 INFO spark.SparkMergeFileRecordHandler: Closing Merge
Operator OFM
15/10/12 14:28:38 ERROR executor.Executor: Exception in task 1.0 in stage 1.0
(TID 4)
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Failed to close AbstractFileMergeOperator
at
org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:115)
at
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
at
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
at
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
at
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close
AbstractFileMergeOperator
at
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:235)
at
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:236)
at
org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:113)
... 15 more
Caused by: java.io.IOException: Unable to rename
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/_task_tmp.-ext-10001/_tmp.000001_0
to
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/_tmp.-ext-10001/000001_0
at
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:208)
... 17 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)