Missed to do a reply-all.

Tim,

spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false works
(sorry there was a typo in my last email, I meant "when I do
"spark.mesos.coarse=false", the job works like a charm. ").

I get this exception with spark.mesos.coarse = true:

15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id"
: "55af4bf26750ad38a444d7cf"}, max= { "_id" : "55af5a61e8a42806f47546c1"}
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611337>15/09/22
20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
"55af5a61e8a42806f47546c1"}, max= null
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611453>Exception
in thread "main" java.lang.OutOfMemoryError: Java heap space
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611524>
at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611599>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611671>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611743>
at scala.Option.getOrElse(Option.scala:120)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611788>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611843>
at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611918>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611990>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612062>
at scala.Option.getOrElse(Option.scala:120)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612107>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612162>
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612245>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612317>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612389>
at scala.Option.getOrElse(Option.scala:120)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612434>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612489>
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612572>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612644>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612716>
at scala.Option.getOrElse(Option.scala:120)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612761>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612816>
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612899>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612971>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613043>
at scala.Option.getOrElse(Option.scala:120)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613088>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613143>
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613226>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613298>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613370>
at scala.Option.getOrElse(Option.scala:120)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613415>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613470>
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613537>
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:78)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613612>15/09/22
20:18:17 INFO SparkContext: Invoking stop() from shutdown hook
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613684>15/09/22
20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
some-ip-here:37706 in memory (size: 1964.0 B, free: 2.8 GB)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613814>15/09/22
20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on mesos-slave10
in memory (size: 1964.0 B, free: 5.2 GB)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613977>15/09/22
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
some-ip-here:37706 in memory (size: 17.2 KB, free: 2.8 GB)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614106>15/09/22
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
mesos-slave105 in memory (size: 17.2 KB, free: 5.2 GB)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614268>15/09/22
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave1
in memory (size: 17.2 KB, free: 5.2 GB)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614429>15/09/22
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave9
in memory (size: 17.2 KB, free: 5.2 GB)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614590>15/09/22
20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave3
in memory (size: 17.2 KB, free: 5.2 GB)
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614751>15/09/22
20:18:17 INFO SparkUI: Stopped Spark web UI at http://some-ip-here:4040
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614831>15/09/22
20:18:17 INFO DAGScheduler: Stopping DAGScheduler
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614890>15/09/22
20:18:17 INFO CoarseMesosSchedulerBackend: Shutting down all executors
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614970>15/09/22
20:18:17 INFO CoarseMesosSchedulerBackend: Asking each executor to shut down
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615056>I0922
20:18:17.794598 171 sched.cpp:1591] Asked to stop the driver
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615125>I0922
20:18:17.794739 143 sched.cpp:835] Stopping framework
'20150803-224832-1577534986-5050-1614-0016'
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615231>15/09/22
20:18:17 INFO CoarseMesosSchedulerBackend: driver.run() returned with code
DRIVER_STOPPED
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615330>15/09/22
20:18:17 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615425>15/09/22
20:18:17 INFO Utils: path =
/tmp/spark-98801318-9c49-473b-bf2f-07ea42187252/blockmgr-0e0e1a1c-894e-4e79-beac-ead0dff43166,
already present as root for deletion.
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615595>15/09/22
20:18:17 INFO MemoryStore: MemoryStore cleared
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615651>15/09/22
20:18:17 INFO BlockManager: BlockManager stopped
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615709>15/09/22
20:18:17 INFO BlockManagerMaster: BlockManagerMaster stopped
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615779>15/09/22
20:18:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615892>15/09/22
20:18:17 INFO SparkContext: Successfully stopped SparkContext
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615963>15/09/22
20:18:17 INFO Utils: Shutdown hook called
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616014>15/09/22
20:18:17 INFO Utils: Deleting directory
/tmp/spark-98801318-9c49-473b-bf2f-07ea42187252
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616111>15/09/22
20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down
remote daemon.
<http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616206>15/09/22
20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut
down; proceeding with flushing remote transports.




On Tue, Sep 22, 2015 at 1:26 AM, Tim Chen <t...@mesosphere.io> wrote:

> Hi Utkarsh,
>
> Just to be sure you originally set coarse to false but then to true? Or is
> it the other way around?
>
> Also what's the exception/stack trace when the driver crashed?
>
> Coarse grain mode per-starts all the Spark executor backends, so has the
> least overhead comparing to fine grain. There is no single answer for which
> mode you should use, otherwise we would have removed one of those modes
> since it depends on your use case.
>
> There are quite some factor why there could be huge GC pauses, but I don't
> think if you switch to standalone your GC pauses go away.
>
> Tim
>
> On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
> wrote:
>
>> I am running Spark 1.4.1 on mesos.
>>
>> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of
>> size 100, 100, 7 and 1 respectively. Lets call it prouctRDD.
>>
>> Creation of "aRdd" needs data pull from multiple data sources, merging it
>> and creating a tuple of JavaRdd, finally aRDD looks something like this:
>> JavaRDD<Tuple4<A1, A2>>
>> bRdd, cRdd and dRdds are just List<> of values.
>>
>> Then apply a transformation on prouctRDD and finally call
>> "saveAsTextFile" to save the result of my transformation.
>>
>> Problem:
>> By setting "spark.mesos.coarse=true", creation of "aRdd" works fine but
>> driver crashes while doing the cartesian but when I do
>> "spark.mesos.coarse=true", the job works like a charm. I am running spark
>> on mesos.
>>
>> Comments:
>> So I wanted to understand what role does "spark.mesos.coarse=true" plays
>> in terms of memory and compute performance. My findings look counter
>> intuitive since:
>>
>>    1. "spark.mesos.coarse=true" just runs on 1 mesos task, so there
>>    should be an overhead of spinning up mesos tasks which should impact the
>>    performance.
>>    2. What config for "spark.mesos.coarse" recommended for running spark
>>    on mesos? Or there is no best answer and it depends on usecase?
>>    3. Also by setting "spark.mesos.coarse=true", I notice that I get
>>    huge GC pauses even with small dataset but a long running job (but this 
>> can
>>    be a separate discussion).
>>
>> Let me know if I am missing something obvious, we are learning spark
>> tuning as we move forward :)
>>
>> --
>> Thanks,
>> -Utkarsh
>>
>
>


-- 
Thanks,
-Utkarsh

Reply via email to