Hi,
I have stopped working on s3n for a long time now. In case you are working
with parquet and writing files s3a is the only alternative to failures.
Otherwise why not use just s3://?
Regards,
Gourav
On Wed, Apr 13, 2016 at 12:17 PM, Steve Loughran
wrote:
>
> On 12
On 12 Apr 2016, at 22:05, Martin Eden
> wrote:
Hi everyone,
Running on EMR 4.3 with Spark 1.6.0 and the provided S3N native driver I manage
to process approx 1TB of strings inside gzipped parquet in about 50 mins on a
20 node cluster (8
Hi everyone,
Running on EMR 4.3 with Spark 1.6.0 and the provided S3N native driver I
manage to process approx 1TB of strings inside gzipped parquet in about 50
mins on a 20 node cluster (8 cores, 60Gb ram). That's about 17MBytes/sec
per node.
This seems sub optimal.
The processing is very