Hi Ted
In general I want this application to use all available resources. I just
bumped the driver memory to 2G. I also bumped the executor memory up to 2G.
It will take a couple of hours before I know if this made a difference or
not
I am not sure if setting executor memory is a good idea. I am concerned that
this will reduce concurrency
Thanks
Andy
From: Ted Yu <yuzhih...@gmail.com>
Date: Friday, July 22, 2016 at 2:54 PM
To: Andrew Davidson <a...@santacruzintegration.com>
Cc: "user @spark" <user@spark.apache.org>
Subject: Re: Exception in thread "dispatcher-event-loop-1"
java.lang.OutOfMemoryError: Java heap space
> How much heap memory do you give the driver ?
>
> On Fri, Jul 22, 2016 at 2:17 PM, Andy Davidson <a...@santacruzintegration.com>
> wrote:
>> Given I get a stack trace in my python notebook I am guessing the driver is
>> running out of memory?
>>
>> My app is simple it creates a list of dataFrames from s3://, and counts each
>> one. I would not think this would take a lot of driver memory
>>
>> I am not running my code locally. Its using 12 cores. Each node has 6G.
>>
>> Any suggestions would be greatly appreciated
>>
>> Andy
>>
>> def work():
>>
>> constituentDFS = getDataFrames(constituentDataSets)
>>
>> results = ["{} {}".format(name, constituentDFS[name].count()) for name in
>> constituentDFS]
>>
>> print(results)
>>
>> return results
>>
>>
>>
>> %timeit -n 1 -r 1 results = work()
>>
>>
>> in (.0) 1 def work(): 2
>> constituentDFS = getDataFrames(constituentDataSets)> 3 results = ["{}
>> {}".format(name, constituentDFS[name].count()) for name in constituentDFS]
>> 4 print(results) 5 return results
>>
>> 16/07/22 17:54:38 WARN TaskSetManager: Stage 146 contains a task of very
>> large size (145 KB). The maximum recommended task size is 100 KB.
>>
>> 16/07/22 18:39:47 WARN HeartbeatReceiver: Removing executor 2 with no recent
>> heartbeats: 153037 ms exceeds timeout 12 ms
>>
>> Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError:
>> Java heap space
>>
>> at java.util.jar.Manifest$FastInputStream.(Manifest.java:332)
>>
>> at java.util.jar.Manifest$FastInputStream.(Manifest.java:327)
>>
>> at java.util.jar.Manifest.read(Manifest.java:195)
>>
>> at java.util.jar.Manifest.(Manifest.java:69)
>>
>> at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199)
>>
>> at java.util.jar.JarFile.getManifest(JarFile.java:180)
>>
>> at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:944)
>>
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:450)
>>
>> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>>
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>>
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>
>> at
>> org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImp
>> l.scala:510)
>>
>> at
>> org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.s
>> cala:473)
>>
>> at
>> org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceive
>> r$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:199)
>>
>> at
>> org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceive
>> r$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:195)
>>
>> at
>> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Traversa
>> bleLike.scala:772)
>>
>> at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>
>> at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>
>> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>>
>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>>
>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>>
>> at
>>
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771>>
)
>>
>> at org.apache.spark.HeartbeatReceiver.org
>> <http://org.apache.spark.HeartbeatReceiver.org>
>> $apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:195)
>>
>> at
>> org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1.applyOrElse(Hea
>> rtbeatReceiver.scala:118)
>>
>> at
>> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:
>> 104)
>>
>> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>>
>> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>>
>> 16/07/22 19:08:29 WARN NettyRpcEnv: Ignored message: true
>>
>>
>