[jira] [Updated] (HIVE-15054) Hive insertion query execution fails on Hive on Spark

Aihua Xu (JIRA) Tue, 25 Oct 2016 09:05:49 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aihua Xu updated HIVE-15054:
----------------------------
    Status: Patch Available  (was: Open)

patch-1: initial patch. Change the id from the partitionId to taskAttemptId. It 
would cause some test failures per the comment. Will see the test result and I 
will fix it in the following patch.

So the problem is: if we use partitionId as part of {{mapred.task.id}} and the 
taskId is used as the filename in FileSinkOp, then there will be a conflict if 
there is a retry on the same task. Switch to taskAttemptId which should be 
unique.

> Hive insertion query execution fails on Hive on Spark
> -----------------------------------------------------
>
>                 Key: HIVE-15054
>                 URL: https://issues.apache.org/jira/browse/HIVE-15054
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 2.0.0
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>         Attachments: HIVE-15054.1.patch
>
>
> The query of {{insert overwrite table tbl1}} sometimes will fail with the 
> following errors. Seems we are constructing taskAttemptId with partitionId 
> which is not unique if there are multiple attempts.
> {noformat}
> ava.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0
>  to: 
> hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15054) Hive insertion query execution fails on Hive on Spark

Reply via email to