Re: S3n performance (@AaronDavidson)

2016-04-13 Thread Gourav Sengupta
Hi, I have stopped working on s3n for a long time now. In case you are working with parquet and writing files s3a is the only alternative to failures. Otherwise why not use just s3://? Regards, Gourav On Wed, Apr 13, 2016 at 12:17 PM, Steve Loughran wrote: > > On 12

Re: S3n performance (@AaronDavidson)

2016-04-13 Thread Steve Loughran
On 12 Apr 2016, at 22:05, Martin Eden > wrote: Hi everyone, Running on EMR 4.3 with Spark 1.6.0 and the provided S3N native driver I manage to process approx 1TB of strings inside gzipped parquet in about 50 mins on a 20 node cluster (8

S3n performance (@AaronDavidson)

2016-04-12 Thread Martin Eden
Hi everyone, Running on EMR 4.3 with Spark 1.6.0 and the provided S3N native driver I manage to process approx 1TB of strings inside gzipped parquet in about 50 mins on a 20 node cluster (8 cores, 60Gb ram). That's about 17MBytes/sec per node. This seems sub optimal. The processing is very