I am running Spark 1.4.1 on mesos. The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of size 100, 100, 7 and 1 respectively. Lets call it prouctRDD.
Creation of "aRdd" needs data pull from multiple data sources, merging it and creating a tuple of JavaRdd, finally aRDD looks something like this: JavaRDD<Tuple4<A1, A2>> bRdd, cRdd and dRdds are just List<> of values. Then apply a transformation on prouctRDD and finally call "saveAsTextFile" to save the result of my transformation. Problem: By setting "spark.mesos.coarse=true", creation of "aRdd" works fine but driver crashes while doing the cartesian but when I do "spark.mesos.coarse=true", the job works like a charm. I am running spark on mesos. Comments: So I wanted to understand what role does "spark.mesos.coarse=true" plays in terms of memory and compute performance. My findings look counter intuitive since: 1. "spark.mesos.coarse=true" just runs on 1 mesos task, so there should be an overhead of spinning up mesos tasks which should impact the performance. 2. What config for "spark.mesos.coarse" recommended for running spark on mesos? Or there is no best answer and it depends on usecase? 3. Also by setting "spark.mesos.coarse=true", I notice that I get huge GC pauses even with small dataset but a long running job (but this can be a separate discussion). Let me know if I am missing something obvious, we are learning spark tuning as we move forward :) -- Thanks, -Utkarsh