Missed to do a reply-all.


spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false works
(sorry there was a typo in my last email, I meant "when I do
"spark.mesos.coarse=false", the job works like a charm. ").

I get this exception with spark.mesos.coarse = true:

15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id"
: "55af4bf26750ad38a444d7cf"}, max= { "_id" : "55af5a61e8a42806f47546c1"}
20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
"55af5a61e8a42806f47546c1"}, max= null
in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:78)
20:18:17 INFO SparkContext: Invoking stop() from shutdown hook
20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
some-ip-here:37706 in memory (size: 1964.0 B, free: 2.8 GB)
20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on mesos-slave10
in memory (size: 1964.0 B, free: 5.2 GB)
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
some-ip-here:37706 in memory (size: 17.2 KB, free: 2.8 GB)
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
mesos-slave105 in memory (size: 17.2 KB, free: 5.2 GB)
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave1
in memory (size: 17.2 KB, free: 5.2 GB)
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave9
in memory (size: 17.2 KB, free: 5.2 GB)
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave3
in memory (size: 17.2 KB, free: 5.2 GB)
20:18:17 INFO SparkUI: Stopped Spark web UI at http://some-ip-here:4040
20:18:17 INFO DAGScheduler: Stopping DAGScheduler
20:18:17 INFO CoarseMesosSchedulerBackend: Shutting down all executors
20:18:17 INFO CoarseMesosSchedulerBackend: Asking each executor to shut down
20:18:17.794598 171 sched.cpp:1591] Asked to stop the driver
20:18:17.794739 143 sched.cpp:835] Stopping framework
20:18:17 INFO CoarseMesosSchedulerBackend: driver.run() returned with code
20:18:17 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
20:18:17 INFO Utils: path =
already present as root for deletion.
20:18:17 INFO MemoryStore: MemoryStore cleared
20:18:17 INFO BlockManager: BlockManager stopped
20:18:17 INFO BlockManagerMaster: BlockManagerMaster stopped
20:18:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
20:18:17 INFO SparkContext: Successfully stopped SparkContext
20:18:17 INFO Utils: Shutdown hook called
20:18:17 INFO Utils: Deleting directory
20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down
remote daemon.
20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut
down; proceeding with flushing remote transports.

On Tue, Sep 22, 2015 at 1:26 AM, Tim Chen <t...@mesosphere.io> wrote:

> Hi Utkarsh,
> Just to be sure you originally set coarse to false but then to true? Or is
> it the other way around?
> Also what's the exception/stack trace when the driver crashed?
> Coarse grain mode per-starts all the Spark executor backends, so has the
> least overhead comparing to fine grain. There is no single answer for which
> mode you should use, otherwise we would have removed one of those modes
> since it depends on your use case.
> There are quite some factor why there could be huge GC pauses, but I don't
> think if you switch to standalone your GC pauses go away.
> Tim
> On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
> wrote:
>> I am running Spark 1.4.1 on mesos.
>> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of
>> size 100, 100, 7 and 1 respectively. Lets call it prouctRDD.
>> Creation of "aRdd" needs data pull from multiple data sources, merging it
>> and creating a tuple of JavaRdd, finally aRDD looks something like this:
>> JavaRDD<Tuple4<A1, A2>>
>> bRdd, cRdd and dRdds are just List<> of values.
>> Then apply a transformation on prouctRDD and finally call
>> "saveAsTextFile" to save the result of my transformation.
>> Problem:
>> By setting "spark.mesos.coarse=true", creation of "aRdd" works fine but
>> driver crashes while doing the cartesian but when I do
>> "spark.mesos.coarse=true", the job works like a charm. I am running spark
>> on mesos.
>> Comments:
>> So I wanted to understand what role does "spark.mesos.coarse=true" plays
>> in terms of memory and compute performance. My findings look counter
>> intuitive since:
>>    1. "spark.mesos.coarse=true" just runs on 1 mesos task, so there
>>    should be an overhead of spinning up mesos tasks which should impact the
>>    performance.
>>    2. What config for "spark.mesos.coarse" recommended for running spark
>>    on mesos? Or there is no best answer and it depends on usecase?
>>    3. Also by setting "spark.mesos.coarse=true", I notice that I get
>>    huge GC pauses even with small dataset but a long running job (but this 
>> can
>>    be a separate discussion).
>> Let me know if I am missing something obvious, we are learning spark
>> tuning as we move forward :)
>> --
>> Thanks,
>> -Utkarsh


Reply via email to