may be you are experiencing problem with FileOutputCommiter vs
DirectCommiter while working with s3? do you have hdfs so you can try it to
verify?

committing in s3 will copy 1-by-1 all partitions to your final destination
bucket from _temporary, so this stage might become a bottleneck(so reducing
number of partitions might "solve" it)
there is a thread in this mailing list regarding this problem few weeks ago



On 7 March 2016 at 23:53, Andy Davidson <a...@santacruzintegration.com>
wrote:

> We just deployed our first streaming apps. The next step is running them
> so they run reliably
>
> We have spend a lot of time reading the various prog guides looking at the
> standalone cluster manager app performance web pages.
>
> Looking at the streaming tab and the stages tab have been the most helpful
> in tuning our app. However we do not understand the connection between
> memory  and # cores will effect throughput and performance. Usually
> adding memory is the cheapest way to improve performance.
>
> When we have a single receiver call spark-submit --total-executor-cores
> 2. Changing the value does not seem to change throughput. our bottle neck
> was s3 write time, saveAsTextFile(). Reducing the number of partitions
> dramatically reduces s3 write times.
>
> Adding memory also does not improve performance
>
> *I would think that adding more cores would allow more concurrent tasks
> run. That is to say reducing num partions would slow things down*
>
> What are best practices?
>
> Kind regards
>
> Andy
>
>
>
>
>
>
>

Reply via email to