Re: Get rid of FileAlreadyExistsError

2016-03-01 Thread Peter Halliday
I haven’t trie spark.hadoop.validateOutputSpecs.  However, it seems that has to 
do with the existence of the output directory itself and not the files.  Maybe 
I’m wrong?

Peter



> On Mar 1, 2016, at 11:53 AM, Sabarish Sasidharan 
>  wrote:
> 
> Have you tried spark.hadoop.validateOutputSpecs?
> 
> On 01-Mar-2016 9:43 pm, "Peter Halliday"  > wrote:
> http://pastebin.com/vbbFzyzb 
> 
> The problem seems to be to be two fold.  First, the ParquetFileWriter in 
> Hadoop allows for an overwrite flag that Spark doesn’t allow to be set.  The 
> second is that the DirectParquetOutputCommitter has an abortTask that’s 
> empty.  I see SPARK-8413 open on this too, but no plans on changing this.  
> I’m surprised not to see this fixed yet.
> 
> Peter Halliday 
> 
> 
> 
>> On Mar 1, 2016, at 10:01 AM, Ted Yu > > wrote:
>> 
>> Do you mind pastebin'ning the stack trace with the error so that we know 
>> which part of the code is under discussion ?
>> 
>> Thanks
>> 
>> On Tue, Mar 1, 2016 at 7:48 AM, Peter Halliday > > wrote:
>> I have a Spark application that has a Task seem to fail, but it actually did 
>> write out some of the files that were assigned it.  And Spark assigns 
>> another executor that task, and it gets a FileAlreadyExistsException.  The 
>> Hadoop code seems to allow for files to be overwritten, but I see the 1.5.1 
>> version of this code doesn’t allow for this to be passed in.  Is that 
>> correct?
>> 
>> Peter Halliday
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> 
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> 
>> 
>> 
> 



Re: Get rid of FileAlreadyExistsError

2016-03-01 Thread Peter Halliday
http://pastebin.com/vbbFzyzb

The problem seems to be to be two fold.  First, the ParquetFileWriter in Hadoop 
allows for an overwrite flag that Spark doesn’t allow to be set.  The second is 
that the DirectParquetOutputCommitter has an abortTask that’s empty.  I see 
SPARK-8413 open on this too, but no plans on changing this.  I’m surprised not 
to see this fixed yet.

Peter Halliday 



> On Mar 1, 2016, at 10:01 AM, Ted Yu  wrote:
> 
> Do you mind pastebin'ning the stack trace with the error so that we know 
> which part of the code is under discussion ?
> 
> Thanks
> 
> On Tue, Mar 1, 2016 at 7:48 AM, Peter Halliday  > wrote:
> I have a Spark application that has a Task seem to fail, but it actually did 
> write out some of the files that were assigned it.  And Spark assigns another 
> executor that task, and it gets a FileAlreadyExistsException.  The Hadoop 
> code seems to allow for files to be overwritten, but I see the 1.5.1 version 
> of this code doesn’t allow for this to be passed in.  Is that correct?
> 
> Peter Halliday
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 



Re: Get rid of FileAlreadyExistsError

2016-03-01 Thread Ted Yu
Do you mind pastebin'ning the stack trace with the error so that we know
which part of the code is under discussion ?

Thanks

On Tue, Mar 1, 2016 at 7:48 AM, Peter Halliday  wrote:

> I have a Spark application that has a Task seem to fail, but it actually
> did write out some of the files that were assigned it.  And Spark assigns
> another executor that task, and it gets a FileAlreadyExistsException.  The
> Hadoop code seems to allow for files to be overwritten, but I see the 1.5.1
> version of this code doesn’t allow for this to be passed in.  Is that
> correct?
>
> Peter Halliday
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>