[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

Xuefu Zhang (JIRA) Thu, 09 Mar 2017 12:53:08 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903847#comment-15903847
 ]


Xuefu Zhang commented on HIVE-16156:
------------------------------------

I'm not sure what the danger is. This is to fix a bug. If the bug is valid and 
the fix is reasonable, I'm not sure why we need to worry about other possible, 
future bugs which might manifest. Such bugs demand their respective fixes 
rather than relying on this bug not being fixed.

> FileSinkOperator should delete existing output target when renaming
> -------------------------------------------------------------------
>
>                 Key: HIVE-16156
>                 URL: https://issues.apache.org/jira/browse/HIVE-16156
>             Project: Hive
>          Issue Type: Bug
>          Components: Operators
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>       at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>       at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>       at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>       at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>       at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>       at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>       at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>       at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>       at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>       at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>       at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:227)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$200(FileSinkOperator.java:133)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1019)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:179)
>       ... 15 more
> {code}
> Hive should check the existence of the target output and delete it before 
> renaming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

Reply via email to