Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-02 Thread Neil Jonkers
Hi, Can you set the following parameters in your mapred-site.xml file please: mapred.output.direct.EmrFileSystemtrue mapred.output.direct.NativeS3FileSystemtrue You can also config this at cluster launch time with the following Classification via EMR console:

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-02 Thread Alexander Pivovarov
Hi Neil Yes! it helps!!! I do not see _temporary in console output anymore. saveAsTextFile is fast now. 2015-09-02 23:07:00,022 INFO [task-result-getter-0] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 18.0 in stage 0.0 (TID 18) in 4398 ms on ip-10-0-24-103.ec2.internal

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-01 Thread Alexander Pivovarov
Should I use DirectOutputCommitter? spark.hadoop.mapred.output.committer.class com.appsflyer.spark.DirectOutputCommitter On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov wrote: > I run spark 1.4.1 in amazom aws emr 4.0.0 > > For some reason spark saveAsTextFile is

spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-01 Thread Alexander Pivovarov
I run spark 1.4.1 in amazom aws emr 4.0.0 For some reason spark saveAsTextFile is very slow on emr 4.0.0 in comparison to emr 3.8 (was 5 sec, now 95 sec) Actually saveAsTextFile says that it's done in 4.356 sec but after that I see lots of INFO messages with 404 error from com.amazonaws.latency

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-01 Thread Alexander Pivovarov
I checked previous emr config (emr-3.8) mapred-site.xml has the following setting mapred.output.committer.classorg.apache.hadoop.mapred.DirectFileOutputCommitter On Tue, Sep 1, 2015 at 7:33 PM, Alexander Pivovarov wrote: > Should I use DirectOutputCommitter? >