Re: Graphframes pattern causing java heap space errors
Thanks Ted for the input. I was able to get it working with pyspark shell but the same job submitted via 'spark-submit' using client or cluster deploy mode ends up with these errors: ~ java.lang.OutOfMemoryError: Java heap space at java.lang.Object.clone(Native Method) at akka.util.CompactByteString$.apply(ByteString.scala:410) at akka.util.ByteString$.apply(ByteString.scala:22) at akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45) at akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57) at akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43) at akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:179) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR Utils: Uncaught exception in thread task-result-getter-3 java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2219) at java.util.ArrayList.grow(ArrayList.java:242) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) at java.util.ArrayList.add(ArrayList.java:440) at com.esotericsoftware.kryo.util.MapReferenceResolver.nextReadId(MapReferenceResolver.java:33) at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:766) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:727) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:275) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) When using pyspark, I'm passing the options via cmd line: pyspark --master yarn --deploy-mode client --driver-memory 6g --executor-memory 8g --executor-cores 4 while I'm doing the same via conf.set() in the script that I'm using with spark-submit. I do notice the options being picked up correctly under the Environment tab but its not clear as to why pyspark is successful while spark-submit fails. Is there any difference in how these two ways to run the job? Thanks! On Sun, Apr 10, 2016 at 4:28 AM, Ted Yu wrote: > Looks like the exception occurred on driver. > > Consider increasing the values for the following config: > > conf.set("spark.driver.memory", "10240m") > conf.set("spark.driver.maxResultSize", "2g") > > Cheers > > On Sat, Apr 9, 2016 at 9:02 PM, Buntu Dev wrote: > >> I'm running it via pyspark against yarn in client deploy mode. I do >> notice in the spark web ui under Environment tab all the options I've set, >> so I'm guessing these are accepted. >> >> On Sat, Apr 9, 2016 at 5:52 PM, Jacek Laskowski wrote: >> >>> Hi, >>> >>> (I haven't played with GraphFrames) >>> >>> What's your `sc.master`? How do you run your application -- >>> spark-submit or java -jar or sbt run or...? The reason I'm asking is >>> that few options might not be in use whatsoever, e.g. >>> spark.driver.memory and spark.executor.memory in local mo
Re: Graphframes pattern causing java heap space errors
Looks like the exception occurred on driver. Consider increasing the values for the following config: conf.set("spark.driver.memory", "10240m") conf.set("spark.driver.maxResultSize", "2g") Cheers On Sat, Apr 9, 2016 at 9:02 PM, Buntu Dev wrote: > I'm running it via pyspark against yarn in client deploy mode. I do notice > in the spark web ui under Environment tab all the options I've set, so I'm > guessing these are accepted. > > On Sat, Apr 9, 2016 at 5:52 PM, Jacek Laskowski wrote: > >> Hi, >> >> (I haven't played with GraphFrames) >> >> What's your `sc.master`? How do you run your application -- >> spark-submit or java -jar or sbt run or...? The reason I'm asking is >> that few options might not be in use whatsoever, e.g. >> spark.driver.memory and spark.executor.memory in local mode. >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev wrote: >> > I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M >> (60mb) >> > edges: >> > >> > tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)") >> > >> > I keep running into Java heap space errors: >> > >> > ~ >> > >> > ERROR actor.ActorSystemImpl: Uncaught fatal error from thread >> > [sparkDriver-akka.actor.default-dispatcher-33] shutting down ActorSystem >> > [sparkDriver] >> > java.lang.OutOfMemoryError: Java heap space >> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90) >> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88) >> > at scala.Array$.ofDim(Array.scala:218) >> > at akka.util.ByteIterator.toArray(ByteIterator.scala:462) >> > at akka.util.ByteString.toArray(ByteString.scala:321) >> > at >> > >> akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168) >> > at >> > >> akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513) >> > at >> > >> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357) >> > at >> > >> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352) >> > at >> > >> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) >> > at akka.actor.FSM$class.processEvent(FSM.scala:595) >> > at >> > >> akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220) >> > at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589) >> > at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583) >> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >> > at akka.actor.ActorCell.invoke(ActorCell.scala:456) >> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >> > at akka.dispatch.Mailbox.run(Mailbox.scala:219) >> > at >> > >> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> > at >> > >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> > at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> > at >> > >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> > >> > ~ >> > >> > >> > Here is my config: >> > >> > conf.set("spark.executor.memory", "8192m") >> > conf.set("spark.executor.cores", 4) >> > conf.set("spark.driver.memory", "10240m") >> > conf.set("spark.driver.maxResultSize", "2g") >> > conf.set("spark.kryoserializer.buffer.max", "1024mb") >> > >> > >> > Wanted to know if there are any other configs to tweak? >> > >> > >> > Thanks! >> > >
Re: Graphframes pattern causing java heap space errors
I'm running it via pyspark against yarn in client deploy mode. I do notice in the spark web ui under Environment tab all the options I've set, so I'm guessing these are accepted. On Sat, Apr 9, 2016 at 5:52 PM, Jacek Laskowski wrote: > Hi, > > (I haven't played with GraphFrames) > > What's your `sc.master`? How do you run your application -- > spark-submit or java -jar or sbt run or...? The reason I'm asking is > that few options might not be in use whatsoever, e.g. > spark.driver.memory and spark.executor.memory in local mode. > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev wrote: > > I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M > (60mb) > > edges: > > > > tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)") > > > > I keep running into Java heap space errors: > > > > ~ > > > > ERROR actor.ActorSystemImpl: Uncaught fatal error from thread > > [sparkDriver-akka.actor.default-dispatcher-33] shutting down ActorSystem > > [sparkDriver] > > java.lang.OutOfMemoryError: Java heap space > > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90) > > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88) > > at scala.Array$.ofDim(Array.scala:218) > > at akka.util.ByteIterator.toArray(ByteIterator.scala:462) > > at akka.util.ByteString.toArray(ByteString.scala:321) > > at > > > akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168) > > at > > > akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513) > > at > > > akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357) > > at > > > akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352) > > at > > > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > > at akka.actor.FSM$class.processEvent(FSM.scala:595) > > at > > > akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220) > > at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589) > > at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > > at > > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > ~ > > > > > > Here is my config: > > > > conf.set("spark.executor.memory", "8192m") > > conf.set("spark.executor.cores", 4) > > conf.set("spark.driver.memory", "10240m") > > conf.set("spark.driver.maxResultSize", "2g") > > conf.set("spark.kryoserializer.buffer.max", "1024mb") > > > > > > Wanted to know if there are any other configs to tweak? > > > > > > Thanks! >
Re: Graphframes pattern causing java heap space errors
Hi, (I haven't played with GraphFrames) What's your `sc.master`? How do you run your application -- spark-submit or java -jar or sbt run or...? The reason I'm asking is that few options might not be in use whatsoever, e.g. spark.driver.memory and spark.executor.memory in local mode. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev wrote: > I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M (60mb) > edges: > > tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)") > > I keep running into Java heap space errors: > > ~ > > ERROR actor.ActorSystemImpl: Uncaught fatal error from thread > [sparkDriver-akka.actor.default-dispatcher-33] shutting down ActorSystem > [sparkDriver] > java.lang.OutOfMemoryError: Java heap space > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90) > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88) > at scala.Array$.ofDim(Array.scala:218) > at akka.util.ByteIterator.toArray(ByteIterator.scala:462) > at akka.util.ByteString.toArray(ByteString.scala:321) > at > akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168) > at > akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513) > at > akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357) > at > akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at akka.actor.FSM$class.processEvent(FSM.scala:595) > at > akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220) > at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589) > at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > ~ > > > Here is my config: > > conf.set("spark.executor.memory", "8192m") > conf.set("spark.executor.cores", 4) > conf.set("spark.driver.memory", "10240m") > conf.set("spark.driver.maxResultSize", "2g") > conf.set("spark.kryoserializer.buffer.max", "1024mb") > > > Wanted to know if there are any other configs to tweak? > > > Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Graphframes pattern causing java heap space errors
I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M (60mb) edges: tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)") I keep running into Java heap space errors: ~ ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-33] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java heap space at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90) at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88) at scala.Array$.ofDim(Array.scala:218) at akka.util.ByteIterator.toArray(ByteIterator.scala:462) at akka.util.ByteString.toArray(ByteString.scala:321) at akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168) at akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513) at akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357) at akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at akka.actor.FSM$class.processEvent(FSM.scala:595) at akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220) at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589) at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ~ Here is my config: conf.set("spark.executor.memory", "8192m") conf.set("spark.executor.cores", 4) conf.set("spark.driver.memory", "10240m") conf.set("spark.driver.maxResultSize", "2g") conf.set("spark.kryoserializer.buffer.max", "1024mb") Wanted to know if there are any other configs to tweak? Thanks!