[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16156:
-------------------------------
    Description: 
If a task get killed (for whatever a reason) after it completes the renaming 
the temp output to final output during commit, subsequent task attempts will 
fail when renaming because of the existence of the target output. This can 
happen, however rarely.
{code}
Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
rename output from: 
hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
 to: 
hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
java.util.concurrent.ExecutionException: Exception thrown by job
        at 
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
        at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
        at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
        at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
java.lang.IllegalStateException: Hit error while closing operators - failing 
tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output 
from: 
hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
 to: 
hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
        at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
        at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
        at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
        at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
        at 
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
        at 
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
output from: 
hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
 to: 
hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:227)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$200(FileSinkOperator.java:133)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1019)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
        at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:179)
        ... 15 more
{code}
Hive should check the existence of the target output and delete it before 
renaming.

  was:
If a task get killed (for whatever a reason) after it completes the renaming 
the temp output to final output during commit, subsequent task attempts will 
fail when renaming because of the existence of the target output. This can 
happen, however rarely.

Hive should check the existence of the target output and delete it before 
renaming.


> FileSinkOperator should delete existing output target when renaming
> -------------------------------------------------------------------
>
>                 Key: HIVE-16156
>                 URL: https://issues.apache.org/jira/browse/HIVE-16156
>             Project: Hive
>          Issue Type: Bug
>          Components: Operators
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>       at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>       at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>       at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>       at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>       at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>       at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>       at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>       at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>       at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>       at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>       at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:227)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$200(FileSinkOperator.java:133)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1019)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:179)
>       ... 15 more
> {code}
> Hive should check the existence of the target output and delete it before 
> renaming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to