[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906108#comment-15906108
 ] 

Hive QA commented on HIVE-16156:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857437/HIVE-16156.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 10339 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4085/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4085/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4085/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857437 - PreCommit-HIVE-Build

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.1.patch, HIVE-16156.2.patch, HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906063#comment-15906063
 ] 

Hive QA commented on HIVE-16156:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857437/HIVE-16156.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10339 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=217)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4083/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4083/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4083/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857437 - PreCommit-HIVE-Build

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.1.patch, HIVE-16156.2.patch, HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906017#comment-15906017
 ] 

Hive QA commented on HIVE-16156:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857437/HIVE-16156.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 10339 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4081/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4081/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4081/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857437 - PreCommit-HIVE-Build

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.1.patch, HIVE-16156.2.patch, HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905865#comment-15905865
 ] 

Sergey Shelukhin commented on HIVE-16156:
-

+1 pending tests; file status can be retrieved only if rename fails; can be 
changed on commit

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.1.patch, HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:227)
>   at 
> 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905853#comment-15905853
 ] 

Xuefu Zhang commented on HIVE-16156:


Patch #1 incorporated [~sershe]'s comment above.

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.1.patch, HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:227)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$200(FileSinkOperator.java:133)
>   at 
> 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903863#comment-15903863
 ] 

Sergey Shelukhin commented on HIVE-16156:
-

I'm not saying this bug shouldn't be fixed. However, if there's future bug, 
currently it would fail with exception while with silent delete it could delete 
data/produce incorrect results.
So I suggest we add a log statement to this fix, since this is not a normal 
condition, and also add a sanity check... given that we expect to be deleting 
the output from another copy of the same task, checking file sizes might be a 
good sanity check.

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903847#comment-15903847
 ] 

Xuefu Zhang commented on HIVE-16156:


I'm not sure what the danger is. This is to fix a bug. If the bug is valid and 
the fix is reasonable, I'm not sure why we need to worry about other possible, 
future bugs which might manifest. Such bugs demand their respective fixes 
rather than relying on this bug not being fixed.

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903822#comment-15903822
 ] 

Sergey Shelukhin commented on HIVE-16156:
-

Hmm... it seems dangerous to delete that file - should it at least make sure 
it's a smaller/equal sized file that we are replacing?
And log a warning.
If there's an independent bug with path collision it can nuke the file silently.

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> 

[jira] [Commented] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903646#comment-15903646
 ] 

Hive QA commented on HIVE-16156:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857033/HIVE-16156.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10335 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=137)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4054/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4054/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4054/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857033 - PreCommit-HIVE-Build

> FileSinkOperator should delete existing output target when renaming
> ---
>
> Key: HIVE-16156
> URL: https://issues.apache.org/jira/browse/HIVE-16156
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16156.patch
>
>
> If a task get killed (for whatever a reason) after it completes the renaming 
> the temp output to final output during commit, subsequent task attempts will 
> fail when renaming because of the existence of the target output. This can 
> happen, however rarely.
> {code}
> Job failed with org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 306 in stage 5.0 failed 4 times, most recent failure: Lost task 306.4 in 
> stage 5.0 (TID 2956, hadoopworker1444-sjc1.prod.uber.internal): 
> java.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_task_tmp.-ext-10001/_tmp.000306_0
>  to: 
> hdfs://nameservice1/tmp/hive-staging/xuefu_hive_2017-03-08_02-55-25_355_1482508192727176207-1/_tmp.-ext-10001/000306_0
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
>