On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <a...@mcelwee.me> wrote:
> I've used fine-grained mode on our mesos spark clusters until this week, > mostly because it was the default. I started trying coarse-grained because > of the recent chatter on the mailing list about wanting to move the mesos > execution path to coarse-grained only. The odd things is, coarse-grained vs > fine-grained seems to yield drastic cluster utilization metrics for any of > our jobs that I've tried out this week. > > If this is best as a new thread, please let me know, and I'll try not to > derail this conversation. Otherwise, details below: > I think it's ok to discuss it here. > We monitor our spark clusters with ganglia, and historically, we maintain > at least 90% cpu utilization across the cluster. Making a single > configuration change to use coarse-grained execution instead of > fine-grained consistently yields a cpu utilization pattern that starts > around 90% at the beginning of the job, and then it slowly decreases over > the next 1-1.5 hours to level out around 65% cpu utilization on the > cluster. Does anyone have a clue why I'd be seeing such a negative effect > of switching to coarse-grained mode? GC activity is comparable in both > cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github. > I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation <https://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos> on coarse-grained mode? iulian