Re: Tuning Spark Streaming jobs

2014-12-23 Thread Timothy Chen
Hi Gerard,

SPARK-4286 is the ticket I am working on, which besides supporting shuffle
service it also supports the executor scaling callbacks (kill/request
total) for coarse grain mode.

I created SPARK-4940 to discuss more about the distribution problem, and
let's bring our discussions there.

Tim



On Dec 22, 2014, at 11:16 AM, Gerard Maas gerard.m...@gmail.com wrote:

Hi Tim,

That would be awesome. We have seen some really disparate Mesos allocations
for our Spark Streaming jobs. (like (7,4,1) over 3 executors for 4 kafka
consumer instead of the ideal (3,3,3,3))
For network dependent consumers, achieving an even deployment would
 provide a reliable and reproducible streaming job execution from the
performance point of view.
We're deploying in coarse grain mode. Not sure Spark Streaming would work
well in fine-grained given the added latency to acquire a worker.

You mention that you're changing the Mesos scheduler. Is there a Jira where
this job is taking place?

-kr, Gerard.


On Mon, Dec 22, 2014 at 6:01 PM, Timothy Chen tnac...@gmail.com wrote:

 Hi Gerard,

 Really nice guide!

 I'm particularly interested in the Mesos scheduling side to more evenly
 distribute cores across cluster.

 I wonder if you are using coarse grain mode or fine grain mode?

 I'm making changes to the spark mesos scheduler and I think we can propose
 a best way to achieve what you mentioned.

 Tim

 Sent from my iPhone

  On Dec 22, 2014, at 8:33 AM, Gerard Maas gerard.m...@gmail.com wrote:
 
  Hi,
 
  After facing issues with the performance of some of our Spark Streaming
  jobs, we invested quite some effort figuring out the factors that affect
  the performance characteristics of a Streaming job. We  defined an
  empirical model that helps us reason about Streaming jobs and applied it
 to
  tune the jobs in order to maximize throughput.
 
  We have summarized our findings in a blog post with the intention of
  collecting feedback and hoping that it is useful to other Spark Streaming
  users facing similar issues.
 
  http://www.virdata.com/tuning-spark/
 
  Your feedback is welcome.
 
  With kind regards,
 
  Gerard.
  Data Processing Team Lead
  Virdata.com
  @maasg



Re: Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi Tim,

That would be awesome. We have seen some really disparate Mesos allocations
for our Spark Streaming jobs. (like (7,4,1) over 3 executors for 4 kafka
consumer instead of the ideal (3,3,3,3))
For network dependent consumers, achieving an even deployment would
 provide a reliable and reproducible streaming job execution from the
performance point of view.
We're deploying in coarse grain mode. Not sure Spark Streaming would work
well in fine-grained given the added latency to acquire a worker.

You mention that you're changing the Mesos scheduler. Is there a Jira where
this job is taking place?

-kr, Gerard.


On Mon, Dec 22, 2014 at 6:01 PM, Timothy Chen tnac...@gmail.com wrote:

 Hi Gerard,

 Really nice guide!

 I'm particularly interested in the Mesos scheduling side to more evenly
 distribute cores across cluster.

 I wonder if you are using coarse grain mode or fine grain mode?

 I'm making changes to the spark mesos scheduler and I think we can propose
 a best way to achieve what you mentioned.

 Tim

 Sent from my iPhone

  On Dec 22, 2014, at 8:33 AM, Gerard Maas gerard.m...@gmail.com wrote:
 
  Hi,
 
  After facing issues with the performance of some of our Spark Streaming
  jobs, we invested quite some effort figuring out the factors that affect
  the performance characteristics of a Streaming job. We  defined an
  empirical model that helps us reason about Streaming jobs and applied it
 to
  tune the jobs in order to maximize throughput.
 
  We have summarized our findings in a blog post with the intention of
  collecting feedback and hoping that it is useful to other Spark Streaming
  users facing similar issues.
 
  http://www.virdata.com/tuning-spark/
 
  Your feedback is welcome.
 
  With kind regards,
 
  Gerard.
  Data Processing Team Lead
  Virdata.com
  @maasg