Missed to do a reply-all. Tim,
spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false works (sorry there was a typo in my last email, I meant "when I do "spark.mesos.coarse=false", the job works like a charm. "). I get this exception with spark.mesos.coarse = true: 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" : "55af5a61e8a42806f47546c1"} <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611337>15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" : "55af5a61e8a42806f47546c1"}, max= null <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611453>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611524> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611599> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611671> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611743> at scala.Option.getOrElse(Option.scala:120) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611788> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611843> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611918> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611990> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612062> at scala.Option.getOrElse(Option.scala:120) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612107> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612162> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612245> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612317> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612389> at scala.Option.getOrElse(Option.scala:120) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612434> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612489> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612572> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612644> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612716> at scala.Option.getOrElse(Option.scala:120) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612761> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612816> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612899> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612971> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613043> at scala.Option.getOrElse(Option.scala:120) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613088> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613143> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613226> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613298> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613370> at scala.Option.getOrElse(Option.scala:120) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613415> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613470> at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613537> at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:78) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613612>15/09/22 20:18:17 INFO SparkContext: Invoking stop() from shutdown hook <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613684>15/09/22 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on some-ip-here:37706 in memory (size: 1964.0 B, free: 2.8 GB) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613814>15/09/22 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on mesos-slave10 in memory (size: 1964.0 B, free: 5.2 GB) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613977>15/09/22 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on some-ip-here:37706 in memory (size: 17.2 KB, free: 2.8 GB) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614106>15/09/22 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave105 in memory (size: 17.2 KB, free: 5.2 GB) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614268>15/09/22 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave1 in memory (size: 17.2 KB, free: 5.2 GB) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614429>15/09/22 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave9 in memory (size: 17.2 KB, free: 5.2 GB) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614590>15/09/22 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on mesos-slave3 in memory (size: 17.2 KB, free: 5.2 GB) <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614751>15/09/22 20:18:17 INFO SparkUI: Stopped Spark web UI at http://some-ip-here:4040 <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614831>15/09/22 20:18:17 INFO DAGScheduler: Stopping DAGScheduler <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614890>15/09/22 20:18:17 INFO CoarseMesosSchedulerBackend: Shutting down all executors <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614970>15/09/22 20:18:17 INFO CoarseMesosSchedulerBackend: Asking each executor to shut down <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615056>I0922 20:18:17.794598 171 sched.cpp:1591] Asked to stop the driver <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615125>I0922 20:18:17.794739 143 sched.cpp:835] Stopping framework '20150803-224832-1577534986-5050-1614-0016' <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615231>15/09/22 20:18:17 INFO CoarseMesosSchedulerBackend: driver.run() returned with code DRIVER_STOPPED <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615330>15/09/22 20:18:17 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615425>15/09/22 20:18:17 INFO Utils: path = /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252/blockmgr-0e0e1a1c-894e-4e79-beac-ead0dff43166, already present as root for deletion. <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615595>15/09/22 20:18:17 INFO MemoryStore: MemoryStore cleared <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615651>15/09/22 20:18:17 INFO BlockManager: BlockManager stopped <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615709>15/09/22 20:18:17 INFO BlockManagerMaster: BlockManagerMaster stopped <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615779>15/09/22 20:18:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615892>15/09/22 20:18:17 INFO SparkContext: Successfully stopped SparkContext <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615963>15/09/22 20:18:17 INFO Utils: Shutdown hook called <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616014>15/09/22 20:18:17 INFO Utils: Deleting directory /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252 <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616111>15/09/22 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616206>15/09/22 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. On Tue, Sep 22, 2015 at 1:26 AM, Tim Chen <t...@mesosphere.io> wrote: > Hi Utkarsh, > > Just to be sure you originally set coarse to false but then to true? Or is > it the other way around? > > Also what's the exception/stack trace when the driver crashed? > > Coarse grain mode per-starts all the Spark executor backends, so has the > least overhead comparing to fine grain. There is no single answer for which > mode you should use, otherwise we would have removed one of those modes > since it depends on your use case. > > There are quite some factor why there could be huge GC pauses, but I don't > think if you switch to standalone your GC pauses go away. > > Tim > > On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar <utkarsh2...@gmail.com> > wrote: > >> I am running Spark 1.4.1 on mesos. >> >> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of >> size 100, 100, 7 and 1 respectively. Lets call it prouctRDD. >> >> Creation of "aRdd" needs data pull from multiple data sources, merging it >> and creating a tuple of JavaRdd, finally aRDD looks something like this: >> JavaRDD<Tuple4<A1, A2>> >> bRdd, cRdd and dRdds are just List<> of values. >> >> Then apply a transformation on prouctRDD and finally call >> "saveAsTextFile" to save the result of my transformation. >> >> Problem: >> By setting "spark.mesos.coarse=true", creation of "aRdd" works fine but >> driver crashes while doing the cartesian but when I do >> "spark.mesos.coarse=true", the job works like a charm. I am running spark >> on mesos. >> >> Comments: >> So I wanted to understand what role does "spark.mesos.coarse=true" plays >> in terms of memory and compute performance. My findings look counter >> intuitive since: >> >> 1. "spark.mesos.coarse=true" just runs on 1 mesos task, so there >> should be an overhead of spinning up mesos tasks which should impact the >> performance. >> 2. What config for "spark.mesos.coarse" recommended for running spark >> on mesos? Or there is no best answer and it depends on usecase? >> 3. Also by setting "spark.mesos.coarse=true", I notice that I get >> huge GC pauses even with small dataset but a long running job (but this >> can >> be a separate discussion). >> >> Let me know if I am missing something obvious, we are learning spark >> tuning as we move forward :) >> >> -- >> Thanks, >> -Utkarsh >> > > -- Thanks, -Utkarsh