We just deployed our first streaming apps. The next step is running them so
they run reliably

We have spend a lot of time reading the various prog guides looking at the
standalone cluster manager app performance web pages.

Looking at the streaming tab and the stages tab have been the most helpful
in tuning our app. However we do not understand the connection between
memory  and # cores will effect throughput and performance. Usually adding
memory is the cheapest way to improve performance.

When we have a single receiver call spark-submit --total-executor-cores 2.
Changing the value does not seem to change throughput. our bottle neck was
s3 write time, saveAsTextFile(). Reducing the number of partitions
dramatically reduces s3 write times.

Adding memory also does not improve performance

I would think that adding more cores would allow more concurrent tasks run.
That is to say reducing num partions would slow things down

What are best practices?

Kind regards

Andy








Reply via email to