Re: graphframe out of memory
No I did not, I thought Spark would take care of that itself since I have put in the arguments. On Thu, Sep 7, 2017 at 9:26 PM, Lukas Bradleywrote: > Did you also increase the size of the heap of the Java app that is > starting Spark? > > https://alvinalexander.com/blog/post/java/java-xmx-xms- > memory-heap-size-control > > On Thu, Sep 7, 2017 at 12:16 PM, Imran Rajjad wrote: > >> I am getting Out of Memory error while running connectedComponents job on >> graph with around 12000 vertices and 134600 edges. >> I am running spark in embedded mode in a standalone Java application and >> have tried to increase the memory but it seems that its not taking any >> effect >> >> sparkConf = new SparkConf().setAppName("SOME APP >> NAME").setMaster("local[10]") >> .set("spark.executor.memory","5g") >> .set("spark.driver.memory","8g") >> .set("spark.driver.maxResultSize","1g") >> .set("spark.sql.warehouse.dir", "file:///d:/spark/tmp") >> .set("hadoop.home.dir", "file:///D:/spark-2.1.0-bin-hadoop2.7/bin"); >> >> spark = SparkSession.builder().config(sparkConf).getOrCreate(); >> spark.sparkContext().setLogLevel("ERROR"); >> spark.sparkContext().setCheckpointDir("D:/spark/tmp"); >> >> the stack trace >> java.lang.OutOfMemoryError: Java heap space >> at java.util.Arrays.copyOf(Arrays.java:3332) >> at java.lang.AbstractStringBuilder.ensureCapacityInternal(Abstr >> actStringBuilder.java:124) >> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder >> .java:448) >> at java.lang.StringBuilder.append(StringBuilder.java:136) >> at scala.StringContext.standardInterpolator(StringContext.scala:126) >> at scala.StringContext.s(StringContext.scala:95) >> at org.apache.spark.sql.execution.QueryExecution.toString( >> QueryExecution.scala:230) >> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio >> nId(SQLExecution.scala:54) >> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) >> at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$e >> xecute$1(Dataset.scala:2385) >> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D >> ataset$$collect$1.apply(Dataset.scala:2390) >> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D >> ataset$$collect$1.apply(Dataset.scala:2390) >> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801) >> at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$c >> ollect(Dataset.scala:2390) >> at org.apache.spark.sql.Dataset.collect(Dataset.scala:2366) >> at org.graphframes.lib.ConnectedComponents$.skewedJoin(Connecte >> dComponents.scala:239) >> at org.graphframes.lib.ConnectedComponents$.org$graphframes$ >> lib$ConnectedComponents$$run(ConnectedComponents.scala:308) >> at org.graphframes.lib.ConnectedComponents.run(ConnectedCompone >> nts.scala:139) >> >> GraphFrame version is 0.5.0 and Spark version is 2.1.1 >> >> regards, >> Imran >> >> -- >> I.R >> > > -- I.R
Re: graphframe out of memory
Did you also increase the size of the heap of the Java app that is starting Spark? https://alvinalexander.com/blog/post/java/java-xmx-xms-memory-heap-size-control On Thu, Sep 7, 2017 at 12:16 PM, Imran Rajjadwrote: > I am getting Out of Memory error while running connectedComponents job on > graph with around 12000 vertices and 134600 edges. > I am running spark in embedded mode in a standalone Java application and > have tried to increase the memory but it seems that its not taking any > effect > > sparkConf = new SparkConf().setAppName("SOME APP > NAME").setMaster("local[10]") > .set("spark.executor.memory","5g") > .set("spark.driver.memory","8g") > .set("spark.driver.maxResultSize","1g") > .set("spark.sql.warehouse.dir", "file:///d:/spark/tmp") > .set("hadoop.home.dir", "file:///D:/spark-2.1.0-bin-hadoop2.7/bin"); > > spark = SparkSession.builder().config(sparkConf).getOrCreate(); > spark.sparkContext().setLogLevel("ERROR"); > spark.sparkContext().setCheckpointDir("D:/spark/tmp"); > > the stack trace > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at java.lang.AbstractStringBuilder.ensureCapacityInternal( > AbstractStringBuilder.java:124) > at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at scala.StringContext.standardInterpolator(StringContext.scala:126) > at scala.StringContext.s(StringContext.scala:95) > at org.apache.spark.sql.execution.QueryExecution. > toString(QueryExecution.scala:230) > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( > SQLExecution.scala:54) > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) > at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$ > execute$1(Dataset.scala:2385) > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$ > Dataset$$collect$1.apply(Dataset.scala:2390) > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$ > Dataset$$collect$1.apply(Dataset.scala:2390) > at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801) > at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$ > collect(Dataset.scala:2390) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:2366) > at org.graphframes.lib.ConnectedComponents$.skewedJoin( > ConnectedComponents.scala:239) > at org.graphframes.lib.ConnectedComponents$.org$graphframes$lib$ > ConnectedComponents$$run(ConnectedComponents.scala:308) > at org.graphframes.lib.ConnectedComponents.run( > ConnectedComponents.scala:139) > > GraphFrame version is 0.5.0 and Spark version is 2.1.1 > > regards, > Imran > > -- > I.R >
graphframe out of memory
I am getting Out of Memory error while running connectedComponents job on graph with around 12000 vertices and 134600 edges. I am running spark in embedded mode in a standalone Java application and have tried to increase the memory but it seems that its not taking any effect sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[10]") .set("spark.executor.memory","5g") .set("spark.driver.memory","8g") .set("spark.driver.maxResultSize","1g") .set("spark.sql.warehouse.dir", "file:///d:/spark/tmp") .set("hadoop.home.dir", "file:///D:/spark-2.1.0-bin-hadoop2.7/bin"); spark = SparkSession.builder().config(sparkConf).getOrCreate(); spark.sparkContext().setLogLevel("ERROR"); spark.sparkContext().setCheckpointDir("D:/spark/tmp"); the stack trace java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136) at scala.StringContext.standardInterpolator(StringContext.scala:126) at scala.StringContext.s(StringContext.scala:95) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:230) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:54) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) at org.apache.spark.sql.Dataset.org $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2390) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2390) at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801) at org.apache.spark.sql.Dataset.org $apache$spark$sql$Dataset$$collect(Dataset.scala:2390) at org.apache.spark.sql.Dataset.collect(Dataset.scala:2366) at org.graphframes.lib.ConnectedComponents$.skewedJoin(ConnectedComponents.scala:239) at org.graphframes.lib.ConnectedComponents$.org$graphframes$lib$ConnectedComponents$$run(ConnectedComponents.scala:308) at org.graphframes.lib.ConnectedComponents.run(ConnectedComponents.scala:139) GraphFrame version is 0.5.0 and Spark version is 2.1.1 regards, Imran -- I.R