[ 
https://issues.apache.org/jira/browse/SPARK-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345519#comment-14345519
 ] 

Jason Hubbard commented on SPARK-6067:
--------------------------------------

Hi baishuo.  To reproduce, create a Hive table that is partitioned, then create 
a hive context that connects to the metastore with the table you just created.  
Next, create a SchemaRDD that you can use to load data into that table and 
execute an sql in the hive context inserting that data into the table.  Finally 
kill a task that is executing the InsertIntoHive job.  The task will try to 
restart, but since the folder for that partition has already been created, it 
will fail until the entire job fails.  I have uploaded the entire stack trace.

> Spark sql hive dynamic partitions job will fail if task fails
> -------------------------------------------------------------
>
>                 Key: SPARK-6067
>                 URL: https://issues.apache.org/jira/browse/SPARK-6067
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Jason Hubbard
>            Priority: Minor
>         Attachments: job.log
>
>
> When inserting into a hive table from spark sql while using dynamic 
> partitioning, if a task fails it will cause the task to continue to fail and 
> eventually fail the job.
> /mytable/.hive-staging_hive_2015-02-27_11-53-19_573_222-3/-ext-10000/partition=2015-02-04/part-00001
>  for client <ip> already exists
> The task may need to clean up after a failed task to write to the location of 
> the previously failed task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to