Re: OutOfMemoryError - When saving Word2Vec
Hi Sharad, what's your vocabulary size and vector length for Word2Vec? Regards, Yuhao 2016-06-13 20:04 GMT+08:00 sharad82 : > Is this the right forum to post Spark related issues ? I have tried this > forum along with StackOverflow but not seeing any response. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142p27151.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: OutOfMemoryError - When saving Word2Vec
Is this the right forum to post Spark related issues ? I have tried this forum along with StackOverflow but not seeing any response. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142p27151.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: OutOfMemoryError - When saving Word2Vec
Hi Sharad. The array size you (or the serializer) tries to allocate is just too big for the JVM. You can also split your input further by increasing parallelism. Following is good explanintion https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit regards, Vaquar khan On Sun, Jun 12, 2016 at 5:08 AM, sharad82 wrote: > When trying to save the word2vec model trained over 10G of data leads to > below OOM error. > > java.lang.OutOfMemoryError: Requested array size exceeds VM limit > > Spark Version: 1.6 > spark.dynamicAllocation.enable false > spark.executor.memory 75g > spark.driver.memory 150g > spark.driver.cores 10 > > Full Stack Trace: > > java.lang.OutOfMemoryError: Requested array size exceeds VM limit > at java.util.Arrays.copyOf(Arrays.java:3332) > at > > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) > at > > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > scala.StringContext.standardInterpolator(StringContext.scala:122) > at scala.StringContext.s(StringContext.scala:90) > at > > org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) > at > > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52) > at > > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) > at > > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at > > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139) > at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:334) > at > > org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:271) > at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:91) > at > org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131) > at > org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:172) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Regards, Vaquar Khan +91 830-851-1500
OutOfMemoryError - When saving Word2Vec
When trying to save the word2vec model trained over 10G of data leads to below OOM error. java.lang.OutOfMemoryError: Requested array size exceeds VM limit Spark Version: 1.6 spark.dynamicAllocation.enable false spark.executor.memory 75g spark.driver.memory 150g spark.driver.cores 10 Full Stack Trace: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) at java.lang.StringBuilder.append(StringBuilder.java:136) at java.lang.StringBuilder.append(StringBuilder.java:131) at scala.StringContext.standardInterpolator(StringContext.scala:122) at scala.StringContext.s(StringContext.scala:90) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:334) at org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:271) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:91) at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131) at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:172) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org