[ 
https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15054:
----------------------------
    Status: Patch Available  (was: Open)

patch-3: switch to partitionId_attemptNumber. Spark will have a different 
taskId for the different attempts of the same task while hive needs to use the 
same id to figure out if the data are duplicate. So seems we have to use 
partitionId_attemptNumber here. 


> Hive insertion query execution fails on Hive on Spark
> -----------------------------------------------------
>
>                 Key: HIVE-15054
>                 URL: https://issues.apache.org/jira/browse/HIVE-15054
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 2.0.0
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>         Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, 
> HIVE-15054.3.patch
>
>
> The query of {{insert overwrite table tbl1}} sometimes will fail with the 
> following errors. Seems we are constructing taskAttemptId with partitionId 
> which is not unique if there are multiple attempts.
> {noformat}
> ava.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0
>  to: 
> hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to