Re: Graphframes pattern causing java heap space errors

2016-04-10 Thread Buntu Dev
Thanks Ted for the input. I was able to get it working with pyspark shell
but the same job submitted via 'spark-submit' using client or cluster
deploy mode ends up with these errors:

~
java.lang.OutOfMemoryError: Java heap space
at java.lang.Object.clone(Native Method)
at akka.util.CompactByteString$.apply(ByteString.scala:410)
at akka.util.ByteString$.apply(ByteString.scala:22)
at
akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
at
akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
at
akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
at
akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:179)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR Utils: Uncaught exception in thread task-result-getter-3
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2219)
at java.util.ArrayList.grow(ArrayList.java:242)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
at java.util.ArrayList.add(ArrayList.java:440)
at
com.esotericsoftware.kryo.util.MapReferenceResolver.nextReadId(MapReferenceResolver.java:33)
at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:766)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:727)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:275)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


When using pyspark, I'm passing the options via cmd line:

pyspark --master yarn --deploy-mode client --driver-memory 6g
--executor-memory 8g --executor-cores 4

while I'm doing the same via conf.set() in the script that I'm using with
spark-submit.

I do notice the options being picked up correctly under the Environment tab
but its not clear as to why pyspark is successful while spark-submit fails.
Is there any difference in how these two ways to run the job?

Thanks!


On Sun, Apr 10, 2016 at 4:28 AM, Ted Yu  wrote:

> Looks like the exception occurred on driver.
>
> Consider increasing the values for the following config:
>
> conf.set("spark.driver.memory", "10240m")
> conf.set("spark.driver.maxResultSize", "2g")
>
> Cheers
>
> On Sat, Apr 9, 2016 at 9:02 PM, Buntu Dev  wrote:
>
>> I'm running it via pyspark against yarn in client deploy mode. I do
>> notice in the spark web ui under Environment tab all the options I've set,
>> so I'm guessing these are accepted.
>>
>> On Sat, Apr 9, 2016 at 5:52 PM, Jacek Laskowski  wrote:
>>
>>> Hi,
>>>
>>> (I haven't played with GraphFrames)
>>>
>>> What's your `sc.master`? How do you run your application --
>>> spark-submit or java -jar or sbt run or...? The reason I'm asking is
>>> that few options might not be in use whatsoever, e.g.
>>> spark.driver.memory and spark.executor.memory in local mo

Re: Graphframes pattern causing java heap space errors

2016-04-10 Thread Ted Yu
Looks like the exception occurred on driver.

Consider increasing the values for the following config:

conf.set("spark.driver.memory", "10240m")
conf.set("spark.driver.maxResultSize", "2g")

Cheers

On Sat, Apr 9, 2016 at 9:02 PM, Buntu Dev  wrote:

> I'm running it via pyspark against yarn in client deploy mode. I do notice
> in the spark web ui under Environment tab all the options I've set, so I'm
> guessing these are accepted.
>
> On Sat, Apr 9, 2016 at 5:52 PM, Jacek Laskowski  wrote:
>
>> Hi,
>>
>> (I haven't played with GraphFrames)
>>
>> What's your `sc.master`? How do you run your application --
>> spark-submit or java -jar or sbt run or...? The reason I'm asking is
>> that few options might not be in use whatsoever, e.g.
>> spark.driver.memory and spark.executor.memory in local mode.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev  wrote:
>> > I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M
>> (60mb)
>> > edges:
>> >
>> >  tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)")
>> >
>> > I keep running into Java heap space errors:
>> >
>> > ~
>> >
>> > ERROR actor.ActorSystemImpl: Uncaught fatal error from thread
>> > [sparkDriver-akka.actor.default-dispatcher-33] shutting down ActorSystem
>> > [sparkDriver]
>> > java.lang.OutOfMemoryError: Java heap space
>> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90)
>> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88)
>> > at scala.Array$.ofDim(Array.scala:218)
>> > at akka.util.ByteIterator.toArray(ByteIterator.scala:462)
>> > at akka.util.ByteString.toArray(ByteString.scala:321)
>> > at
>> >
>> akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
>> > at
>> >
>> akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513)
>> > at
>> >
>> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357)
>> > at
>> >
>> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352)
>> > at
>> >
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>> > at akka.actor.FSM$class.processEvent(FSM.scala:595)
>> > at
>> >
>> akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220)
>> > at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589)
>> > at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583)
>> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> > at
>> >
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> > at
>> >
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> > at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> > at
>> >
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> >
>> > ~
>> >
>> >
>> > Here is my config:
>> >
>> > conf.set("spark.executor.memory", "8192m")
>> > conf.set("spark.executor.cores", 4)
>> > conf.set("spark.driver.memory", "10240m")
>> > conf.set("spark.driver.maxResultSize", "2g")
>> > conf.set("spark.kryoserializer.buffer.max", "1024mb")
>> >
>> >
>> > Wanted to know if there are any other configs to tweak?
>> >
>> >
>> > Thanks!
>>
>
>


Re: Graphframes pattern causing java heap space errors

2016-04-09 Thread Buntu Dev
I'm running it via pyspark against yarn in client deploy mode. I do notice
in the spark web ui under Environment tab all the options I've set, so I'm
guessing these are accepted.

On Sat, Apr 9, 2016 at 5:52 PM, Jacek Laskowski  wrote:

> Hi,
>
> (I haven't played with GraphFrames)
>
> What's your `sc.master`? How do you run your application --
> spark-submit or java -jar or sbt run or...? The reason I'm asking is
> that few options might not be in use whatsoever, e.g.
> spark.driver.memory and spark.executor.memory in local mode.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev  wrote:
> > I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M
> (60mb)
> > edges:
> >
> >  tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)")
> >
> > I keep running into Java heap space errors:
> >
> > ~
> >
> > ERROR actor.ActorSystemImpl: Uncaught fatal error from thread
> > [sparkDriver-akka.actor.default-dispatcher-33] shutting down ActorSystem
> > [sparkDriver]
> > java.lang.OutOfMemoryError: Java heap space
> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90)
> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88)
> > at scala.Array$.ofDim(Array.scala:218)
> > at akka.util.ByteIterator.toArray(ByteIterator.scala:462)
> > at akka.util.ByteString.toArray(ByteString.scala:321)
> > at
> >
> akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
> > at
> >
> akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513)
> > at
> >
> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357)
> > at
> >
> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352)
> > at
> >
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
> > at akka.actor.FSM$class.processEvent(FSM.scala:595)
> > at
> >
> akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220)
> > at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589)
> > at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583)
> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> > at
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > at
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > at
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >
> > ~
> >
> >
> > Here is my config:
> >
> > conf.set("spark.executor.memory", "8192m")
> > conf.set("spark.executor.cores", 4)
> > conf.set("spark.driver.memory", "10240m")
> > conf.set("spark.driver.maxResultSize", "2g")
> > conf.set("spark.kryoserializer.buffer.max", "1024mb")
> >
> >
> > Wanted to know if there are any other configs to tweak?
> >
> >
> > Thanks!
>


Re: Graphframes pattern causing java heap space errors

2016-04-09 Thread Jacek Laskowski
Hi,

(I haven't played with GraphFrames)

What's your `sc.master`? How do you run your application --
spark-submit or java -jar or sbt run or...? The reason I'm asking is
that few options might not be in use whatsoever, e.g.
spark.driver.memory and spark.executor.memory in local mode.

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev  wrote:
> I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M (60mb)
> edges:
>
>  tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)")
>
> I keep running into Java heap space errors:
>
> ~
>
> ERROR actor.ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-akka.actor.default-dispatcher-33] shutting down ActorSystem
> [sparkDriver]
> java.lang.OutOfMemoryError: Java heap space
> at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90)
> at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88)
> at scala.Array$.ofDim(Array.scala:218)
> at akka.util.ByteIterator.toArray(ByteIterator.scala:462)
> at akka.util.ByteString.toArray(ByteString.scala:321)
> at
> akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
> at
> akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513)
> at
> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357)
> at
> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352)
> at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
> at akka.actor.FSM$class.processEvent(FSM.scala:595)
> at
> akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220)
> at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589)
> at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> ~
>
>
> Here is my config:
>
> conf.set("spark.executor.memory", "8192m")
> conf.set("spark.executor.cores", 4)
> conf.set("spark.driver.memory", "10240m")
> conf.set("spark.driver.maxResultSize", "2g")
> conf.set("spark.kryoserializer.buffer.max", "1024mb")
>
>
> Wanted to know if there are any other configs to tweak?
>
>
> Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Graphframes pattern causing java heap space errors

2016-04-09 Thread Buntu Dev
I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M (60mb)
edges:

 tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)")

I keep running into Java heap space errors:

~

ERROR actor.ActorSystemImpl: Uncaught fatal error from thread
[sparkDriver-akka.actor.default-dispatcher-33] shutting down
ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90)
at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88)
at scala.Array$.ofDim(Array.scala:218)
at akka.util.ByteIterator.toArray(ByteIterator.scala:462)
at akka.util.ByteString.toArray(ByteString.scala:321)
at 
akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
at 
akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513)
at 
akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357)
at 
akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at akka.actor.FSM$class.processEvent(FSM.scala:595)
at 
akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220)
at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

~


Here is my config:

conf.set("spark.executor.memory", "8192m")
conf.set("spark.executor.cores", 4)
conf.set("spark.driver.memory", "10240m")
conf.set("spark.driver.maxResultSize", "2g")
conf.set("spark.kryoserializer.buffer.max", "1024mb")


Wanted to know if there are any other configs to tweak?


Thanks!