Spark output data to S3 is very slow

Qiang Li Fri, 16 Sep 2016 19:35:38 -0700

Hi,


I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very
quickly, but the last step, spark spend lots of time to rename or move data
from s3 temporary directory to real directory, then I try to set

spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.execution.datasources.parquet.DirectParquetOutputCommitter
or
spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter

But both doesn't work, looks like spark 2.0 removed these configs, how can
I let spark output directly without temporary directory ?

-- 
*This email may contain or reference confidential information and is 
intended only for the individual to whom it is addressed.  Please refrain 
from distributing, disclosing or copying this email and the information 
contained within unless you are the intended recipient.  If you received 
this email in error, please notify us at le...@appannie.com 
<le...@appannie.com>** immediately and remove it from your system.*

Spark output data to S3 is very slow

Reply via email to