Hi Tim, First of all, let m wish you a happy and fulfilling New Year. Sorry for the delay in my response. I was out for the xmas break.
I've added my thoughts to the ticket from the perspective of a streaming Job. @TD: What do you think? -kr, Gerard. On Tue, Dec 23, 2014 at 8:02 PM, Timothy Chen <tnac...@gmail.com> wrote: > Hi Gerard, > > SPARK-4286 is the ticket I am working on, which besides supporting shuffle > service it also supports the executor scaling callbacks (kill/request > total) for coarse grain mode. > > I created SPARK-4940 to discuss more about the distribution problem, and > let's bring our discussions there. > > Tim > > > > On Dec 22, 2014, at 11:16 AM, Gerard Maas <gerard.m...@gmail.com> wrote: > > Hi Tim, > > That would be awesome. We have seen some really disparate Mesos > allocations for our Spark Streaming jobs. (like (7,4,1) over 3 executors > for 4 kafka consumer instead of the ideal (3,3,3,3)) > For network dependent consumers, achieving an even deployment would > provide a reliable and reproducible streaming job execution from the > performance point of view. > We're deploying in coarse grain mode. Not sure Spark Streaming would work > well in fine-grained given the added latency to acquire a worker. > > You mention that you're changing the Mesos scheduler. Is there a Jira > where this job is taking place? > > -kr, Gerard. > > > On Mon, Dec 22, 2014 at 6:01 PM, Timothy Chen <tnac...@gmail.com> wrote: > >> Hi Gerard, >> >> Really nice guide! >> >> I'm particularly interested in the Mesos scheduling side to more evenly >> distribute cores across cluster. >> >> I wonder if you are using coarse grain mode or fine grain mode? >> >> I'm making changes to the spark mesos scheduler and I think we can >> propose a best way to achieve what you mentioned. >> >> Tim >> >> Sent from my iPhone >> >> > On Dec 22, 2014, at 8:33 AM, Gerard Maas <gerard.m...@gmail.com> wrote: >> > >> > Hi, >> > >> > After facing issues with the performance of some of our Spark Streaming >> > jobs, we invested quite some effort figuring out the factors that affect >> > the performance characteristics of a Streaming job. We defined an >> > empirical model that helps us reason about Streaming jobs and applied >> it to >> > tune the jobs in order to maximize throughput. >> > >> > We have summarized our findings in a blog post with the intention of >> > collecting feedback and hoping that it is useful to other Spark >> Streaming >> > users facing similar issues. >> > >> > http://www.virdata.com/tuning-spark/ >> > >> > Your feedback is welcome. >> > >> > With kind regards, >> > >> > Gerard. >> > Data Processing Team Lead >> > Virdata.com >> > @maasg >> > >