Re: Spark with S3 DirectOutputCommitter

2016-09-13 Thread Steve Loughran
On 12 Sep 2016, at 19:58, Srikanth > wrote: Thanks Steve! We are already using HDFS as an intermediate store. This is for the last stage of processing which has to put data in S3. The output is partitioned by 3 fields, like

Re: Spark with S3 DirectOutputCommitter

2016-09-12 Thread Srikanth
Thanks Steve! We are already using HDFS as an intermediate store. This is for the last stage of processing which has to put data in S3. The output is partitioned by 3 fields, like .../field1=111/field2=999/date=-MM-DD/* Given that there are 100s for folders and 1000s of subfolder and part

Re: Spark with S3 DirectOutputCommitter

2016-09-11 Thread Steve Loughran
> On 9 Sep 2016, at 21:54, Srikanth wrote: > > Hello, > > I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a > few configs and none of them seem to work. > Output always creates _temporary directory. Rename is killing performance. > I read some

Spark with S3 DirectOutputCommitter

2016-09-09 Thread Srikanth
Hello, I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a few configs and none of them seem to work. Output always creates _temporary directory. Rename is killing performance. I read some notes about DirectOutputcommitter causing problems with speculation turned on. Was