On 12 Sep 2016, at 19:58, Srikanth
> wrote:
Thanks Steve!
We are already using HDFS as an intermediate store. This is for the last stage
of processing which has to put data in S3.
The output is partitioned by 3 fields, like
Thanks Steve!
We are already using HDFS as an intermediate store. This is for the last
stage of processing which has to put data in S3.
The output is partitioned by 3 fields, like
.../field1=111/field2=999/date=-MM-DD/*
Given that there are 100s for folders and 1000s of subfolder and part
> On 9 Sep 2016, at 21:54, Srikanth wrote:
>
> Hello,
>
> I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a
> few configs and none of them seem to work.
> Output always creates _temporary directory. Rename is killing performance.
> I read some
Hello,
I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a
few configs and none of them seem to work.
Output always creates _temporary directory. Rename is killing performance.
I read some notes about DirectOutputcommitter causing problems with
speculation turned on. Was